tensortrade.env.generic.environment module¶

class tensortrade.env.generic.environment.TradingEnv(action_scheme: tensortrade.env.generic.components.action_scheme.ActionScheme, reward_scheme: tensortrade.env.generic.components.reward_scheme.RewardScheme, observer: tensortrade.env.generic.components.observer.Observer, stopper: tensortrade.env.generic.components.stopper.Stopper, informer: tensortrade.env.generic.components.informer.Informer, renderer: tensortrade.env.generic.components.renderer.Renderer, min_periods: int = None, max_episode_steps: int = None, random_start_pct: float = 0.0, **kwargs)[source]¶

Bases: gym.core.Env, tensortrade.core.base.TimeIndexed

A trading environment made for use with Gym-compatible reinforcement learning algorithms.

Parameters:

action_scheme (ActionScheme) – A component for generating an action to perform at each step of the environment.
reward_scheme (RewardScheme) – A component for computing reward after each step of the environment.
observer (Observer) – A component for generating observations after each step of the environment.
informer (Informer) – A component for providing information after each step of the environment.
renderer (Renderer) – A component for rendering the environment.
kwargs (keyword arguments) – Additional keyword arguments needed to create the environment.

agent_id = None¶

close() → None[source]¶: Closes the environment.

components¶: The components of the environment. (Dict[str,Component], read-only)

episode_id = None¶

render(*args, **kwargs) → Union[RenderFrame, List[RenderFrame], None]¶

reset() → numpy.array[source]¶

Resets the environment.

Returns:	obs (np.array) – The first observation of the environment.

save() → None[source]¶: Saves the rendered view of the environment.

step(action: Any) → Tuple[numpy.array, float, bool, dict][source]¶

Makes one step through the environment.

Parameters:	action (Any) – An action to perform on the environment.
Returns:	np.array – The observation of the environment after the action being performed. float – The computed reward for performing the action. bool – Whether or not the episode is complete. dict – The information gathered after completing the step.