A trading environment is a reinforcement learning environment that follows OpenAI’s gym.Env specification. This allows us to leverage many of the existing reinforcement learning models in our trading agent, if we’d like.

TradingEnv steps through the various interfaces from the tensortrade library in a consistent way, and will likely not change too often as all other parts of tensortrade changes. We’re going to go through an overview of the Trading environment below.

Trading environments are fully configurable gym environments with highly composable components:

  • The ActionScheme interprets and applies the agent’s actions to the environment.
  • The RewardScheme computes the reward for each time step based on the agent’s performance.
  • The Observer generates the next observation for the agent.
  • The Stopper determines whether or not the episode is over.
  • The Informer generates useful monitoring information at each time step.
  • The Renderer renders a view of the environment and interactions.

That’s all there is to it, now it’s just a matter of composing each of these components into a complete environment.

When the reset method of a TradingEnv is called, all of the child components will also be reset. The internal state of each action scheme, reward scheme, observer, stopper, and informer will be set back to their default values, ready for the next episode.

What if I can’t make a particular environment?

If none of the environments available in codebase serve your needs let us know! We would love to hear about so we can keep improving the quality of our framework as well as keeping up with the needs of the people using it.