data science | Blockchain Man

In our previous posts (part 1, part 2) we showed how to get historical stock data from the Alpha Vantage API, use Pickle to cache it, and how prep it in Pandas. Now we are ready to throw it in Prophet!

So, after loading our main.py file, we get ticker data by passing the stock symbol to our get_symbol function, which will check the cache and get daily data going back as far as is available via AlphaVantage.

>>> symbol = "ARKK"
>>> ticker = get_symbol(symbol)
./cache/ARKK_2019_10_19.pickle not found
{'1. Information': 'Daily Prices (open, high, low, close) and Volumes', '2. Symbol': 'ARKK', '3. Last Refreshed': '2019-10-18', '4. Output Size': 'Full size', '5. Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Simple Moving Average (SMA)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Relative Strength Index (RSI)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern Time'}
./cache/ARKK_2019_10_19.pickle saved

Running Prophet

Now we’re not going to do anything here with the original code other than wrap it in a function that we can call again later. Our alpha_df_to_prophet_df() function renames our datetime index and close price series data columns to the columns that Prophet expects. You can follow the original Medium post for an explanation of what’s going on; we just want the fitted history and forecast dataframes in our return statement.

def prophet(ticker, fcast_time=360):
    ticker = alpha_df_to_prophet_df(ticker)
    df_prophet = Prophet(changepoint_prior_scale=0.15, daily_seasonality=True)
    df_prophet.fit(ticker)
    fcast_time = fcast_time
    df_forecast = df_prophet.make_future_dataframe(periods=fcast_time, freq='D')
    df_forecast = df_prophet.predict(df_forecast)
    return df_prophet, df_forecast

>>> df_prophet, df_forecast = prophet(ticker)
Initial log joint probability = -11.1039
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       3671.96       0.11449       1846.88           1           1      120   
...
    3510       3840.64   3.79916e-06       20.3995   7.815e-08       0.001     4818  LS failed, Hessian reset 
    3534       3840.64   1.38592e-06       16.2122           1           1     4851   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance

The whole process runs within a minute. Even twenty years of Google daily data can be processed quickly.

The last thing we want to do is concat the forecast data back to the original ticker data and Pickle it back to our file system. We rename our index back ‘date’ as it was before we modified it, then join it to the original Alpha Vantage data.

def concat(ticker, df_forecast):
    df = df_forecast.rename(columns={'ds': 'date'}).set_index('date')[['trend', 'yhat_lower', 'yhat_upper', 'yhat']]
    frames = [ticker, df]
    result = pd.concat(frames, axis=1)
    return result

Seeing the results

Since these are Pandas dataframes, we can use matplotlib to see the results, and Prophet also includes Plotly support. But as someone who looks at live charts in TradingView throughout the day, I’d like something more responsive. So we loaded the Bokeh library and created the following function to match.

ARKK plot using matplotlib. Static only.

ARKK plot in Plotly. Not great. UI is clunky and doesn’t work well in my dev VM browser.

def prophet_bokeh(df_prophet, df_forecast):
    p = figure(x_axis_type='datetime')
    p.varea(y1='yhat_lower', y2='yhat_upper', x='ds', color='#0072B2', source=df_forecast, fill_alpha=0.2)
    p.line(df_prophet.history['ds'].dt.to_pydatetime(), df_prophet.history['y'], legend="History", line_color="black")
    p.line(df_forecast.ds, df_forecast.yhat, legend="Forecast", line_color='#0072B2')
    save(p)

>>> output_file("./charts/{}.html".format(symbol), title=symbol)
>>> prophet_bokeh(df_prophet, df_forecast)

ARKK plot in Bokeh. Can easily zoom and pan. Lovely.

Putting it all together

Our ultimate goal here is to be able to process large batches of stocks, downloading the data from AV and processing it in Prophet in one go. For our initial run, we decided to start with the bundle of stocks in the ARK Innovation ETF. So we copied the holdings into a Python list, and created a couple of functions. One to process an individual stock, an another to process the list. Everything in the first function should be familiar except for two things. One, we added a check for the ‘yhat’ column to make sure that we didn’t inadvertently reprocess any individual stocks while we were debugging. We also refactored get_filename, which just adds the stock ticker plus today’s date to a string. It’s used in get_symbol during the Alpha Vantage call, as well as here when we save the Prophet-ized data back to the cache.

def process(symbol):
    ticker = get_symbol(symbol)
    if 'yhat' in ticker:
        print("DF exists, exiting")
        return
    df_prophet, df_forecast = prophet(ticker)
    output_file("./charts/{}.html".format(symbol), title=symbol)
    prophet_bokeh(df_prophet, df_forecast)
    result = concat(ticker, df_forecast)
    file = get_filename(symbol, CACHE_DIR) + '.pickle'
    pickle.dump(result, open(file, "wb"))
    return

Finally, our process_list function. We had a bit of a wrinkle at first. Since we’re using the free AlphaVantage API, we’re limited to 5 API calls per minute. Now since we’re making three in each get_symbol() call we get an exception if we process the loop more than once in sixty seconds. Now I could have just gotten rid of the SMA and RSI calls, ultimately decided just to calculate the duration of each loop and just sleep until the minute was up. Obviously not the most elegant solution, but it works.

def process_list(symbol_list):
    for symbol in symbol_list:
        start = time.time()
        process(symbol)
        end = time.time()
        elapsed = end - start
        print ("Finished processing {} in {}".format(symbol, elapsed))
        if elapsed > 60:
            continue
        elif elapsed < 1:
            continue
        else:
            print('Waiting...')
            time.sleep(60 - elapsed)
            continue

So from there we just pass our list of ARKK stocks, go for a bio-break, and when we come back we’ve got a cache of Pickled Pandas data and Bokeh plots for about thirty stocks.

Where do we go now

Now I’m not putting too much faith into the results of the Prophet data, we didn’t do any customizations, and we just wanted to see what we can do with it. In the days since I started writing up this series, I’ve been thinking about ways to calculate the winners of the plots via a function call. So far I’ve come up with this discount function, that determines the discount of the current price of an asset relative to Prophet’s yhat prediction band.

Continuing with ARKK:

def calculate_discount(current, minimum, maximum):
    return (current - minimum) * 100 / (maximum - minimum)

>>> result['discount'] = calculate_discount(result['4. close'], result['yhat_lower'], result['yhat_upper'])
>>> result.loc['20191016']
1. open           42.990000
2. high           43.080000
3. low            42.694000
4. close          42.800000
5. volume     188400.000000
SMA               44.409800
RSI               47.424600
trend             41.344573
yhat_lower        40.632873
yhat_upper        43.647911
yhat              42.122355
discount          71.877276
Name: 2019-10-16 00:00:00, dtype: float64

A negative number for the discount indicates that the current price is below the prediction band, and may be a buy. Likewise, anything over 100 is above the prediction range and is overpriced, according to the model. We did ultimately pick two out of the ARKK holding that were well below the prediction range and showed a long term forecast, and we’ve started scaling in modestly while we see how things play out.

If we were more cautious, we’d do more backtesting, running limited time slices through Prophet and comparing forecast accuracy against the historical data. Additionally, we’d like to figure out a way to weigh our discount calculation against the accuracy projections.

There’s much more to to explore off of the original Medium post. We haven’t even gotten into integrating Alpha Vantage’s cryptoasset calls, nor have we done any of the validation and performance metrics that are part of the tutorial. It’s likely a part 4 and 5 of this series could follow. Ultimately though, our interest is to get into actual machine learning models such as TensorFlow and see what we can come up with there. While we understand the danger or placing too much weight into trained models, I do think that there may be value to using these frameworks as screeners. Coupled with the value averaging algorithm that we discussed here previously, we may have a good strategy for long-term investing. And anything that I can quantify and remove the emotional factor from is good as well.

I’ve learned so much doing this small project. I’m not sure how much more we’ll do with Prophet per se, but the Alpha Vantage API is very useful, and I’m guessing that I’ll be doing a lot more with Bokeh in the future. During the last week I’ve also discovered a new Python project that aims to provide a unified framework for coupling various equity and crypto exchange APIs with pluggable ML components, and use them to execute various trading strategies. Watch this space for discussion on that soon.

Facebook’s Prophet module is a trend forecasting library for Python. We spent some time over the last week going over it via this awesome introduction on Medium, but decided to do some refactoring to make it more reusable. Previously, we setup our pipenv virtual environment, separated sensitive data from our source code using dotenv, and started working with Alpha Vantage’s stock price and technical indicator API. In this post we’ll save our fetched data using Pickle and do some dataframe manipulations in Pandas. Part 3 is also available now.

Pickling our API results

When we left off, we had just wrote our get_time_series function, to which we pass 'get_daily' or such and a symbol for the stock that we would like to retrieve. We also have our get_technical function that we can use to pull any of the dozens of indicators available through Alpha Vantage’s API. Following the author’s original example, we can load Apple’s price history, simple moving average and RSI using the following calls:

symbol = 'AAPL'
ticker = get_time_series('get_daily', symbol, outputsize='full')
sma = get_technical('get_sma', symbol, time_period=60)
rsi = get_technical('get_rsi', symbol, time_period=60)

We’ve now got three dataframes. In the original piece, the author shows how you can export and import this dataframe using Panda’s .to_csv and read_csv functions. Saving the data is a good idea, especially during this stage of development, because it allows us to cache out data and reduce the number of API calls. (Alpha Vantage’s free tier allows 5 calls per minute, 500 a day. ) However, using CSV to save Panda’s dataframes is not recommended, as you will use index and column data. Python’s Pickle module will serialize the data and preserve it whole.

For our implementation, we will create a get_symbol function, which will check a local cache folder for a copy of the ticker data and load it. Our file naming convention uses the symbol string plus today’s date. Additionally, we concat our three dataframes into one using Pandas concat function:

def get_symbol(symbol):
    CACHE_DIR = './cache'
    # check if cache exists
    symbol = symbol.upper()
    today = datetime.now().strftime("%Y_%m_%d")

    file = CACHE_DIR + '/' + symbol + '_' + today + '.pickle'
    if os.path.isfile(file):
        # load pickle
        print("{} Found".format(file))
        result = pickle.load(open(file, "rb"))
    else:
        # get data, save to pickle
        print("{} not found".format(file))
        ticker = get_time_series('get_daily', symbol, outputsize='full')
        sma = get_technical('get_sma', symbol, time_period=60)
        rsi = get_technical('get_rsi', symbol, time_period=60)

        frames = [ticker, sma, rsi]
        result = pd.concat(frames, axis=1)
        pickle.dump(result, open(file, "wb"))
        print("{} saved".format(file))
    return result

Charts!

The original author left out all his chart code, so I had to figure things out on my own. No worries.

result = get_symbol("goog")
plt.plot(result.index, result['4. close'], result.index, result.SMA, result.index, result.RSI)
plt.show()

Google stock price (blue), 20-day moving average (orange) and RSI (green)

Since the RSI is such a small number relative to the stock price, let’s chart it separately.

    plt.subplot(211, title='Price')
    plt.plot(result.index, result['4. close'], result.index, result.SMA)
    plt.subplot(212, title="RSI")
    plt.plot(result.index, result.RSI)
    plt.show()

We saved both of these in a plot_ticker function for reuse in our library. Now I am no expert on matplotlib, and have only done some basic stuff with Plotly in the past. I’m probably spoiled by looking at TradingView’s wonderful chart tools and dynamic interface, so being able to drag and zoom around in the results is really important to me from a usability standpoint.

Now I am no expert on matplotlib, and have only done some basic stuff with Plotly in the past. I’m probably spoiled by looking at TradingView’s wonderful chart tools and dynamic interface, so being able to drag and zoom around in the results is really important to me from a usability standpoint. So we’ll leave matplotlib behind from here, and I’ll show you how I used Bokeh in the next part.

Framing our data

We already showed how we concat our price, SMA and RSI data together earlier. Let’s take a look at our dataframe metadata. I want to show you the columns, the dtype of those columns, as well as that of the index. Tail is included just for illustration.

>>> ticker.columns
Index(['1. open', '2. high', '3. low', '4. close', '5. volume', 'SMA', 'RSI'], dtype='object')

>>> ticker.dtypes
1. open      float64
2. high      float64
3. low       float64
4. close     float64
5. volume    float64
SMA          float64
RSI          float64
dtype: object

>>> ticker.index
DatetimeIndex(['1999-10-18', '1999-10-19', '1999-10-20', '1999-10-21',
               '1999-10-22', '1999-10-25', '1999-10-26', '1999-10-27',
               '1999-10-28', '1999-10-29',

>>> ticker.tail()
            1. open  2. high  3. low  4. close   5. volume       SMA      RSI
date                                                                         
2019-10-09   227.03   227.79  225.64    227.03  18692600.0  212.0238  56.9637
2019-10-10   227.93   230.44  227.30    230.09  28253400.0  212.4695  57.8109

Now we don’t need all this for Prophet. In fact, it only looks at two series, a datetime column, labeled ‘ds’, and the series data that you want to forecast, a float, as ‘y’. In the original example, the author renames and recasts the data, but this is likely because of the metadata loss when importing from CSV, and isn’t strictly needed. Additionally, we’d like to preserve our original dataframe as we test our procedure code, so we’ll pass a copy.

def alpha_df_to_prophet_df(df):
    prophet_df = df.get('4. close')\
        .reset_index(level=0)\
        .rename(columns={'date': 'ds', '4. close': 'y'})

    # not needed since dtype is correct already
    # df['ds'] = pd.to_datetime(df['ds'])
    # df['y'] = df['y'].astype(float)
    return prophet_df

>>> alpha_df_to_prophet_df(ticker).tail()
             ds       y
5026 2019-10-09  227.03
5027 2019-10-10  230.09
5028 2019-10-11  236.21
5029 2019-10-14  235.87
5030 2019-10-15  235.32

In the first line of prophet_df =we’re selecting only the ‘close’ price column, which is returned with the original DateTimeIndex. We reset the index, which makes this into a ‘date’ column. Finally we rename them accordingly.

And that’s it for today! Next time we will be ready to take a look at Prophet. We’ll process our data, use Bokeh to display it, and finally write a procedure which we can use to process data in bulk.

Tag: data science

Stock price forecasting using FB’s Prophet: Part 3