ElegantRL Demo: Stock Trading Using DDPG (Part II)

5 min readApr 19, 2021

Tutorial for Deep Deterministic Policy Gradient Algorithm (DDPG)

This article by Steven Li, Xiao-Yang Liu, and Yiyan Zeng describes a stock trading application using Deep Deterministic Policy Gradient (DDPG) in ElegantRL. There are two main components, a stock trading environment and a training-backtesting process. In the stock trading environment, we explain the gym-style stock trading environment and its easy-to-customize features. In the training-backtesting process, we describe the training process of a trading agent in ElegantRL and its tricks and the backtesting process to evaluate its performance.

Please check the introductory article for an overview of the ElegantRL library and the ElegantRL Demo: Stock Trading Using DDPG (Part I) for the stock trading application and the DDPG algorithm.

Stock Trading Environment

The environment is designed in the OpenAI gym-style which is considered as the standard implementation of reinforcement learning environments. The environment is divided into three parts, and each part is one function:

Initialization: the stock data from Yahoo Finance is pre-processed, and the variables related to the stock trading task are initialized. An initialization function creates a new environment and interacts with the agent during training.

def __init__(...):
    # Download and Preprocess data
    train_df, eval_df = self.load_stock_trading_data(start_date,   
                        start_eval_date, env_eval_date)        
      
    self.price_ary, self.tech_ary = self.convert_df_to_ary(df, 
                                    tech_indicator_list)    # Initialize parameters
    stock_dim = self.price_ary.shape[1]         
    self.gamma = gamma        
    self.max_stock = max_stock        
    ...        
    self.initial_stocks = np.zeros(stock_dim, dtype=np.float32) if 
                          initial_stocks is None else initial_stocks
         
    # reset()        
    self.day = None         
    self.stocks = None        
    ...
    self.total_asset = None   
      
    # environment information        
    self.env_name = 'StockTradingEnv-v1'        
    self.state_dim = 1 + 2 * stock_dim + self.tech_ary.shape[1]          
    self.action_dim = stock_dim        
    ...        
    self.target_return = 3.5  # 4.3

Reset: the state and variables of the environment are reset to the initial setting. This function is called once the simulation episode stops and needs to restart.

def reset():
    self.day = 0        
    price = self.price_ary[self.day]         
    self.stocks = self.initial_stocks + rd.randint(0, 64,    
                  size=self.initial_stocks.shape)        
    self.amount = self.initial_amount * rd.uniform(0.95, 1.05) -   
                  (self.stocks * price).sum()         
    self.total_asset = self.amount + (self.stocks * price).sum()          
    self.initial_total_asset = self.total_asset        
    self.gamma_reward = 0.0         
    state = np.hstack((self.amount * 2 ** -13,
                       price,                           
                       self.stocks,                            
                       self.tech_ary[self.day],))
            .astype(np.float32) * 2 ** -5        
    return state

Step: the state takes an action from the agent and then returns a list of three things — the next state, the reward, the indication of whether the current episode is DONE or not. The environment computes the next state and the reward, following the state-action transition in the previous blog. The step function is called by the agent to collect the transitions.

def step():
    # Get actions and price of the day
    actions = (actions * self.max_stock).astype(int)
    self.day += 1
    price = self.price_ary[self.day]    # Execute the action: sell first then buy
    # Sell
    for index in np.where(actions < 0)[0]:
        if price[index] > 0:  # Sell only if current asset is > 0
            sell_num_shares = min(self.stocks[index], -
                              actions[index])
            self.stocks[index] -= sell_num_shares
            self.amount += price[index] * sell_num_shares * (1 - 
                           self.sell_cost_pct)    # Buy
    for index in np.where(actions > 0)[0]:  
        ...    state = np.hstack((self.amount * 2 ** -13,
                       price,
                       self.stocks,
                       self.tech_ary[self.day],))
                       .astype(np.float32) * 2 ** -5    # Calculate reward
    total_asset = self.amount + (self.stocks * price).sum()
    reward = (total_asset - self.total_asset) * 2 ** -14  # reward scaling
    self.total_asset = total_asset
    self.gamma_reward = self.gamma_reward * self.gamma + reward    # Check if the episode is done
    done = self.day == self.max_step
    if done:
        reward = self.gamma_reward
        self.episode_return = total_asset / self.initial_total_asset
    return state, reward, done, dict()

Easy-to-customize Features of the Environment

One of the most important features of the stock trading environment in ElegantRL is its easy-to-customize property. Through passing the parameters into the environment, users are able to define their own stock trading environments based on specific needs. In general, the user can customize the following features:

initial_capital: the initial capital that the user wants to invest.

# The unit is in dollar
initial_capital = 1e6

tickers: the stocks that the user wants to trade with.

# finrl.config.NAS_74_TICKER
tickers = ['AAPL', 'ADBE', 'ADI', 'ADP', 'ADSK', 'ALGN', 'ALXN', 'AMAT', 'AMD', 'AMGN', 'AMZN', 'ASML', 'ATVI', 'BIIB', 'BKNG',...,'SNPS', 'SWKS', 'TTWO', 'TXN', 'VRSN', 'VRTX', 'WBA', 'WDC', 'WLTW', 'XEL', 'XLNX']

initial_stocks: the initial shares of each stock and the default values are zeros.

initial_stocks = np.zeros(len(tickers), dtype=np.float32)

buy_cost_pct, sell_cost_pct: the transaction fee of each buying or selling transaction.

buy_cost_pct = 1e-3sell_cost_pct = 1e-3

max_stock: the user is able to define the maximum number of stocks that are allowed to trade per transaction. Users can also set the maximum percentage of capitals to invest in each stock.

max_stock = 1e2

tech_indicator_list: the list of financial indicators that are taken into account, which is used to define a state.

#finrl.config.TECHNICAL_INDICATORS_LIST
tech_indicator_list = ['macd', 'boll_ub', 'boll_lb', 'rsi_30', 'cci_30', 'dx_30', 'close_30_sma', 'close_60_sma']

start_date, start_eval_date, end_eval_date: the training and backtesting time intervals. Thee time dates (or timestamps) are used, once the training period is specified, the rest is backtesting.

start_date = '2008-03-19'start_eval_date = '2016-01-01'end_eval_date = '2021-01-01'

Once the user defines all customizable features, she is ready to create a unique trading environment accordingly.

env = StockTradingEnv('./envs/FinRL', gamma, max_stock, initial_capital, buy_cost_pct, sell_cost_pct, start_date, start_eval_date, end_eval_date, tickers, tech_indicator_list, initial_stocks, if_eval=False)

Training and Backtesting Processes in ElegantRL

Preparation:

Step 1: Install ElegantRL and related packages

ElegantRL
yfinance: yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance.
Stockstats: stockstats inherits and extends pandas.DataFrame to support Stock Statistics and Stock Indicators.

!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git
!pip install yfinance stockstats

Step 2: Import Packages

from elegantrl.run import *
from elegantrl.agent import AgentDDPG
from elegantrl.envs.FinRL.FinRL import StockTradingEnv
import yfinance as yf
from stockstats import StockDataFrame as Sdf

Training Pipeline:

Step 1: Specify Agent and Environment

# Agent
args = Arguments(if_on_policy=True)
args.agent = AgentDDPG() # Environment
...
args.env = StockTradingEnv('./envs/FinRL', gamma, max_stock, initial_capital, buy_cost_pct, sell_cost_pct, start_date, start_eval_date, end_eval_date, tickers, tech_indicator_list, initial_stocks, if_eval=False)

Step 2: Initialize Hyper-parameters

args.gamma = 0.995
args.break_step = int(2e5)
args.net_dim = 2 ** 9
...
args.if_allow_break = False
args.rollout_num = 2 # the number of rollout workers (larger is not always faster)

Step 3: Train and Evaluate the Agent

# the training process will terminate once it reaches the target reward.
train_and_evaluate_mp(args)

Backtesting and Evaluation:

# Backtesting
args = Arguments(if_on_policy=True)
args.agent = AgentDDPG()
args.env = StockTradingEnv(cwd='./', if_eval=True)
args.if_remove = False
args.cwd = './AgentDDPG/StockTradingEnv-v1_0'
args.init_before_training()# Draw the graph
env.draw_cumulative_return(args, torch)

Fig 1. The cumulative return from the Stock Trading agent. [Image by authors].

Fig 2. The Episode return and the learning curve. [Image by authors].

Check out the Colab codes for this Stock Trading demo.

ElegantRL Demo: Stock Trading Using DDPG (Part II)

Stock Trading Environment

Training and Backtesting Processes in ElegantRL

Preparation:

Training Pipeline:

Written by XiaoYang-ElegantRL