Stock price forecasting using FBโ€™s Prophet: Part 3

In our previous posts (part 1, part 2) we showed how to get historical stock data from the Alpha Vantage API, use Pickle to cache it, and how prep it in Pandas. Now we are ready to throw it in Prophet!

So, after loading our main.py file, we get ticker data by passing the stock symbol to our get_symbol function, which will check the cache and get daily data going back as far as is available via AlphaVantage.

>>> symbol = "ARKK"
>>> ticker = get_symbol(symbol)
./cache/ARKK_2019_10_19.pickle not found
{'1. Information': 'Daily Prices (open, high, low, close) and Volumes', '2. Symbol': 'ARKK', '3. Last Refreshed': '2019-10-18', '4. Output Size': 'Full size', '5. Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Simple Moving Average (SMA)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Relative Strength Index (RSI)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern Time'}
./cache/ARKK_2019_10_19.pickle saved

Running Prophet

Now we’re not going to do anything here with the original code other than wrap it in a function that we can call again later. Our alpha_df_to_prophet_df() function renames our datetime index and close price series data columns to the columns that Prophet expects. You can follow the original Medium post for an explanation of what’s going on; we just want the fitted history and forecast dataframes in our return statement.

def prophet(ticker, fcast_time=360):
    ticker = alpha_df_to_prophet_df(ticker)
    df_prophet = Prophet(changepoint_prior_scale=0.15, daily_seasonality=True)
    df_prophet.fit(ticker)
    fcast_time = fcast_time
    df_forecast = df_prophet.make_future_dataframe(periods=fcast_time, freq='D')
    df_forecast = df_prophet.predict(df_forecast)
    return df_prophet, df_forecast

>>> df_prophet, df_forecast = prophet(ticker)
Initial log joint probability = -11.1039
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       3671.96       0.11449       1846.88           1           1      120   
...
    3510       3840.64   3.79916e-06       20.3995   7.815e-08       0.001     4818  LS failed, Hessian reset 
    3534       3840.64   1.38592e-06       16.2122           1           1     4851   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance

The whole process runs within a minute. Even twenty years of Google daily data can be processed quickly.

The last thing we want to do is concat the forecast data back to the original ticker data and Pickle it back to our file system. We rename our index back ‘date’ as it was before we modified it, then join it to the original Alpha Vantage data.

def concat(ticker, df_forecast):
    df = df_forecast.rename(columns={'ds': 'date'}).set_index('date')[['trend', 'yhat_lower', 'yhat_upper', 'yhat']]
    frames = [ticker, df]
    result = pd.concat(frames, axis=1)
    return result

Seeing the results

Since these are Pandas dataframes, we can use matplotlib to see the results, and Prophet also includes Plotly support. But as someone who looks at live charts in TradingView throughout the day, I’d like something more responsive. So we loaded the Bokeh library and created the following function to match.

ARKK plot using matplotlib. Static only.
ARKK plot in Plotly. Not great. UI is clunky and doesn’t work well in my dev VM browser.
def prophet_bokeh(df_prophet, df_forecast):
    p = figure(x_axis_type='datetime')
    p.varea(y1='yhat_lower', y2='yhat_upper', x='ds', color='#0072B2', source=df_forecast, fill_alpha=0.2)
    p.line(df_prophet.history['ds'].dt.to_pydatetime(), df_prophet.history['y'], legend="History", line_color="black")
    p.line(df_forecast.ds, df_forecast.yhat, legend="Forecast", line_color='#0072B2')
    save(p)

>>> output_file("./charts/{}.html".format(symbol), title=symbol)
>>> prophet_bokeh(df_prophet, df_forecast)
ARKK plot in Bokeh. Can easily zoom and pan. Lovely.

Putting it all together

Our ultimate goal here is to be able to process large batches of stocks, downloading the data from AV and processing it in Prophet in one go. For our initial run, we decided to start with the bundle of stocks in the ARK Innovation ETF. So we copied the holdings into a Python list, and created a couple of functions. One to process an individual stock, an another to process the list. Everything in the first function should be familiar except for two things. One, we added a check for the ‘yhat’ column to make sure that we didn’t inadvertently reprocess any individual stocks while we were debugging. We also refactored get_filename, which just adds the stock ticker plus today’s date to a string. It’s used in get_symbol during the Alpha Vantage call, as well as here when we save the Prophet-ized data back to the cache.

def process(symbol):
    ticker = get_symbol(symbol)
    if 'yhat' in ticker:
        print("DF exists, exiting")
        return
    df_prophet, df_forecast = prophet(ticker)
    output_file("./charts/{}.html".format(symbol), title=symbol)
    prophet_bokeh(df_prophet, df_forecast)
    result = concat(ticker, df_forecast)
    file = get_filename(symbol, CACHE_DIR) + '.pickle'
    pickle.dump(result, open(file, "wb"))
    return

Finally, our process_list function. We had a bit of a wrinkle at first. Since we’re using the free AlphaVantage API, we’re limited to 5 API calls per minute. Now since we’re making three in each get_symbol() call we get an exception if we process the loop more than once in sixty seconds. Now I could have just gotten rid of the SMA and RSI calls, ultimately decided just to calculate the duration of each loop and just sleep until the minute was up. Obviously not the most elegant solution, but it works.

def process_list(symbol_list):
    for symbol in symbol_list:
        start = time.time()
        process(symbol)
        end = time.time()
        elapsed = end - start
        print ("Finished processing {} in {}".format(symbol, elapsed))
        if elapsed > 60:
            continue
        elif elapsed < 1:
            continue
        else:
            print('Waiting...')
            time.sleep(60 - elapsed)
            continue

So from there we just pass our list of ARKK stocks, go for a bio-break, and when we come back we’ve got a cache of Pickled Pandas data and Bokeh plots for about thirty stocks.

Where do we go now

Now I’m not putting too much faith into the results of the Prophet data, we didn’t do any customizations, and we just wanted to see what we can do with it. In the days since I started writing up this series, I’ve been thinking about ways to calculate the winners of the plots via a function call. So far I’ve come up with this discount function, that determines the discount of the current price of an asset relative to Prophet’s yhat prediction band.

Continuing with ARKK:

def calculate_discount(current, minimum, maximum):
    return (current - minimum) * 100 / (maximum - minimum)

>>> result['discount'] = calculate_discount(result['4. close'], result['yhat_lower'], result['yhat_upper'])
>>> result.loc['20191016']
1. open           42.990000
2. high           43.080000
3. low            42.694000
4. close          42.800000
5. volume     188400.000000
SMA               44.409800
RSI               47.424600
trend             41.344573
yhat_lower        40.632873
yhat_upper        43.647911
yhat              42.122355
discount          71.877276
Name: 2019-10-16 00:00:00, dtype: float64

A negative number for the discount indicates that the current price is below the prediction band, and may be a buy. Likewise, anything over 100 is above the prediction range and is overpriced, according to the model. We did ultimately pick two out of the ARKK holding that were well below the prediction range and showed a long term forecast, and we’ve started scaling in modestly while we see how things play out.

If we were more cautious, we’d do more backtesting, running limited time slices through Prophet and comparing forecast accuracy against the historical data. Additionally, we’d like to figure out a way to weigh our discount calculation against the accuracy projections.

There’s much more to to explore off of the original Medium post. We haven’t even gotten into integrating Alpha Vantage’s cryptoasset calls, nor have we done any of the validation and performance metrics that are part of the tutorial. It’s likely a part 4 and 5 of this series could follow. Ultimately though, our interest is to get into actual machine learning models such as TensorFlow and see what we can come up with there. While we understand the danger or placing too much weight into trained models, I do think that there may be value to using these frameworks as screeners. Coupled with the value averaging algorithm that we discussed here previously, we may have a good strategy for long-term investing. And anything that I can quantify and remove the emotional factor from is good as well.


I’ve learned so much doing this small project. I’m not sure how much more we’ll do with Prophet per se, but the Alpha Vantage API is very useful, and I’m guessing that I’ll be doing a lot more with Bokeh in the future. During the last week I’ve also discovered a new Python project that aims to provide a unified framework for coupling various equity and crypto exchange APIs with pluggable ML components, and use them to execute various trading strategies. Watch this space for discussion on that soon.

Choosing when, and how to sell

Knowing when to sell is one of the problems I struggle with as a trader, both in equities and crypto markets. In the past, I’ve relied on what the Motley Fool has referred to as a ‘buy and hold forever’ strategy, and it’s worked out well for me with some of the bigger tech stocks. As someone who’s been focusing more on shorter time frames lately, I’ve been having less success. And after watching my crypto portfolio break six figures in the winter of 2017 before crashing almost ninety percent, I’ve been trying to find a way to ensure that I’m able to actually take some profits when my positions start taking those parabolic runs.

One of the interesting metrics around the USD price of Bitcoin is the Mayer Multiple, which is the multiple of the current BTC price over the 200-day moving average. Trace Mayer determined that, historically speaking, the best long-term strategy was to accumulate BTC when when the Mayer Multiple was under 2.4. For comparison, the last time we hit that level was late June of this year when BTC hit $14K. Now I have traditionally been one to dollar-cost average into BTC, weekly, but I had to stop my contributions for a variety of market and personal reasons, but one plan I have been thinking about is to sell a large share of my position when the MM hits 2.88. This is a number I cam up with just by looking at the charts, and is currently about $26,200.

So I was really interested by this strategy put together by former BlackRock portfolio manager Vishal Karir, on how to take profits before the next bitcoin recession. I’m not going to rehash the entire piece here, suffice to say it sets a static accumulation target, buys when the asset value is below this target, and sells when it’s above it. I wanted to do some backtesting with some of my assets and see what I came up with. So I wrote up the following code in a Google Collab doc so I could start playing around with it.

Instead of looking at BTC directly, I wanted to start by looking at the Grayscale BTC ETF, GBTC, so for this block we’re using Pandas wonderful datareader to pull quotes from AlphaVantage.

import os
from datetime import datetime
import pandas_datareader.data as web

!pip install ipywidgets
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

os.environ['ALPHAVANTAGE_API_KEY'] = 'my_key'

f = ""

time_series = [
    ("Intraday Time Series", "av-intraday"),
    ("Daily Time Series", "av-daily"),
    ("Daily Time Series (Adjusted)", "av-daily-adjusted"), 
    ("Weekly Time Series", "av-weekly"), 
    ("Weekly Time Series (Adjusted)", "av-weekly-adjusted"),
    ("Monthly Time Series", "av-monthly"),
    ("Monthly Time Series (Adjusted)", "av-monthly-adjusted")
]  
     

def get_asset_data (ticker, time_frame="av-weekly", start_date="2017-01-01", end_date="2019-09-30"):
  global f
  f = web.DataReader(ticker,
                     time_frame, 
                     start = parse(start_date), 
                     end = parse(end_date))
  return f

interact_manual(get_asset_data, ticker="", time_frame=time_series)

We’re using f as a global for our dataframe. It stores the stock data. We’re also using ipywidgets to allow us to easily change parameters for the data we want to run against.


The next cell allows us to pass this previous dataframe to our backtest function. I wanted to play with various parameters such as the amount of capital available, the max contribution, and the ratio of max contributions to max sell amount.

def run_karir_target(price_data, capital, contribute=100, sell_factor=2):

  def get_asset_value(price):
    return (shares * price)
  
  def buy(amount, price):
    
    nonlocal shares, cash
    if amount > 0:
      if amount > contribute:
        amount = contribute
      action = "BUY"
    else:
      if amount < -sell:
        amount = -sell
      action = "SELL"


    num_shares = amount/price
    print("{} {} shares for {}".format(action, num_shares, amount))

    cash -= amount
    shares += num_shares

    
  f = price_data
  cash = capital
  contribute=contribute
  sell_factor = sell_factor
  
  shares = 0
  sell = sell_factor * contribute
  week = 1

  for i, j in f.iterrows(): 
    if cash < 0:
      print ("Out of cash!")
      break
    price = j['open']
    print("Week {}: {}".format(week,i))

    target = week * contribute
    value = get_asset_value(price)
    print("Target: {} Asset value = {}".format(target,value))

    difference = target - value
    # 0 - 500 = -500

    buy(difference, price)
    new_total = get_asset_value(price)
    total = cash + new_total

    print ("Total: {}, Cash: {}, Shares: {}, Asset Value: {}".format(total, cash, shares, new_total))  
    print()
    week += 1

Now there’s a lot here I will do later to clean up this code, making the params available via a widget as I did for the first function, but for now it works just fine. Here’s a test run with $100 contributed weekly from just after the beginning of the BTC bear market till now.

Week 1 and 2 of our backtest
The ‘bottom’.
We’re not including the entire run, but here’s where we end up.

So, not bad on our hypothetical run. That’s a gain of almost $4700 off of $3769 invested, or a 24% return. Now we’re cherry-picking, or course. At the bottom of the market, we would have had over $7K deployed at a break even. But Karir’s strategy opens up a whole slew of possibilities, and what may be a good rule of thumb for scaling out of positions. Since I’ve been able to get this package working, I’ve been looking at other investments that I’ve made in equities markets, and am starting to form some hypotheses that may help guide best practices for both entries and exits.

My next step, after cleaning up this code a bit, is to figure out a way to run some regression tests on the sell multiple. I think a dynamic variable may actually be more helpful in instances where the asset goes on a parabolic run. But the point here is that I have a framework that I can test my assumptions against.

It shouldn’t be too hard also to integrate Karir’s strategy for accumulation of BTC, as well as altcoin pairings as well. It’s an exciting strategy for investment, and one that can be automated via exchange and broker APIs. There may also be some variations that we can deploy, using this kind of targeted portfolio value as a way to layer limit orders. For this simulation we simply used the open price of the time period, but we could calculate the price an asset would have to be for us to sell it next week, and set some orders in the present period. If the high for that period hits that level, we could simulate the trade and see how that affects our gains.

I can’t wait to model this and put it to use.

Saying ‘no’

Probably one of the most important things I’ve learned recently is the power of saying ‘no’. I’ve usually been gung-ho and enthusiastic when it comes to work, and I guess you could say I’ve been eager to please in a lot of respects. Part of that may be because of self-esteem issues from when I was younger, maybe the need for validation or acceptance, or the need to be liked or loved or whatever. But now, I’m at the point in my life that I don’t feel the need to please everyone, and have started being a lot more discriminating in what I take on.

I’ve mentioned before that as a technical person, I was always the first one people came to when they had problems with their computers. Now, don’t get me wrong, I made a great career out of fixing people’s stuff, but it was mainly because I was always fixing mine and was so good at it. But after twenty years, I’ve gotten tired of the support calls and spending my time working on someone’s 5 year workstation that can’t get Outlook 2016 to work right on Windows 7 or whatever. Or someone wants to spend hours of my time trying to get the straight up cheapest laptop they can find cause they’d rather spend the extra two hundred on lotto tickets. (I’m looking at you, dad.)

As my skills have advanced to deal with larger networks, business problems and software development, I’ve come to recognize where my most unique skills are and where I can have the greatest impact. Everything else has got to go.

I recently picked up David Allen’s Getting Things Done a few weeks back and started rifling through it. He was on Tim Ferris’s podcast more recently and hearing the two of them talk was a great motivation. And then Craig Groeschel had a segment last week on ‘cutting the slack‘ that mentioned the two of them by name, with his tips. I’ve definitely been building my own ‘no’ list, things that I just won’t do anymore. And I’ve been very clear with my boss that we should not do them any more. To quote Groeschel, you “grow with your ‘nos'” .

Now that my political candidate ‘career’ is over (for the foreseeable future,) I’ve been able to focus on a lot of things that I had put on hold for several months during the campaign. I’ve spent more time with my family, caught up on house projects, and I can focus on finishing my degree. But I’ve been asked about filling a leadership position in two of my local parties. The idea appeals to me for several reasons, but I told the first one that I had to consider it, and turned down the second offer outright. My first initial thought was what it would mean to have a democratic socialist as the chair of the local Democratic party. It seems like it aligns with where I want to accomplish, but I’m still on the fence about the effectiveness of traditional electoral politics at this point. I’ll have to save this discussion for another post, but the entire state party will be reorganizing this winter, and it seems like a big opportunity for DSA types to start gaining influence.

I’ve also been working with a blockchain project that I’ve been asked to take over. It’s not really that flattering as the sole-developer and originator of the project quit, and I’m the only other person who’s looked at the code. I was asked to take over formally, and I had to say no, for a variety or reasons related to governance and technical debt — another post coming on that one as well, I’m sure. But even when I was saying no to the person asking, we were exploring the possibility of a new project built on the ashes of the old one. This new one would start fresh, with a proper governance model, and follow a more formal design and test-driven development process than the one that is in a crippled state.

In all, this is part of a broader process that I am engaging in with my wife, to streamline our lives, reduce our clutter, and focus on what’s really important in our lives. We’ve decided that we are no longer buying into the American dream, and are finding ways to exit our salaried jobs, sell our big house, get rid of the mortgage and debt, and do what we do as we see the world.

Our goal is to be FIRE: financially independent and retire early, and saying ‘no’ is how I’m going to get there.

Trade wars and crypto gainz

The last couple days have been pretty interesting in the cryptocurrency and equities markets. The Fed lowered interest rates for the first time since 2008, back before Bitcoin had been created, and CryptoTwitter has been abuzz about what it means. Most see it as a devaluing of the US dollar, and ultimately good for Bitcoin. The Trump tariff chaos is also being seen as ultimately good for Bitcoin, assuming that its ‘store of value’ thesis holds up and it’s used as a hedge against US markets tanking. That’s how I’ve been acting, anyways.

The US economy — at least as measured by GDP and the S&P, we’ll leave broader economic issues for another day — has been in a bull market for much longer than we traditionally see in the market, and prominent investors believe that we are overdue for a pullback. There are claims that the Administration fears a downturn before the next US election, and that the fight between Trump and the Federal Reserve over interest rates is because the President wants to give the economy a boost in the hopes that any impact won’t be felt until after his re-election. With last week’s rate cut, it looks like he won that fight, but we’ll see if things play out like he wants.

The lower interest rate will have a devaluationary effect on the dollar, and seems to have pissed off the Chinese, who have since decided to devalue the Yuan in response. This sent Bitcoin prices flying as the Asia markets opened, and it continued upward after the US markets opened and proceeded to tank about two percent. Whether the Bitcoin price increase is due to capital flight, Asians trying to hedge their capital, or just speculators anticipating such a move remains to be seen.