Morning pages

Yesterday I spent most of my time trying to migrate a production WordPress site to my development environment. Normally, I’ve used Infinite WordPress’s site migration tools to move them, which does the trick of moving all the files and updating the database references to the site URL, but I don’t think it works when the site’s not public. I’m doing a lot of hacks with my Docker setup, importing databases, messing with file permissions, and duplicating a lot of my work since I like to sit downstairs at my desk during the day, and upstairs at night. So part of my challenge is trying to find a setup that works well for me.

I might have to make some sacrifices. JetBrains IDEs don’t like to work over the network, so I’ve got to add a directory sync if I want to keep the files on my network server and work from both workstations. At least I can run Docker from the remote machine, but it’s not supported by the IDE, so I’ll have to figure out how to fit that into my workflow.

The girls were good. Elder did everything I asked her, and got her extra screen time. She did Typing, piano, and two sessions of math in Khan’s, and we managed to keep the house tidy. So that’s a big parenting win. She’s already up and on her laptop right now, ostensibly doing typing, but I don’t hear too much of it going on over there. Maybe she’s doing math.

We had a bit of excitement yesterday when bitcoin went on a little bit of a tear. I noticed it shot up to touch ten thousand and got excited. Elder came over and said “it went UP,” excitedly. We watched the fight for a few minutes, before it dumped, and then went on a bike ride.

I moved my entire Ethereum stash over to BlockFi. There’s always a moment of horror after publishing a large transaction to the blockchain when the doubt sets in. Wondering whether the address was copied correctly or if my opsec failed and some hacker changed the receiving address while it was in my clipboard. Did I check the address. I usually do a small test transaction before sending over the big one, but it still makes me nervous. Especially after reading about the mining firm that said someone sent a $144 Eth transaction with $131 million in gas fees.

So now I have a fair chunk of my assets up on BlockFi. I haven’t touched but a fraction of my BTC; it’s just too much risk for me to do that. I’ve got a roughly even spit on there between BTC, ETH, and USD stablecoins, and I’m considering whether to put more USD there. I’ve still got the girl’s BTC accounts, but I don’t want to mix them with mine, and I’m not yet sure if I can open an account in their name or if I’ll have to do like I did for lending club and make multiple ones in my name.

Due to the coronavirus, the IRS is allowing 2019 IRA contributions up until July 1. I’m considering whether I want to do this, or throw some more cash into BlockFi. My IRA is on fire right now, I calculated 70% realized gains off of this market rally, and my unrealized gains for the year are much higher than when I calculated them a couple of weeks ago. I’ve still got active value average positions that are in play, and I’m probably going to be short on cash before they complete, so I need some powder. I just doing know whether I should sell some of my other positions, or put more cash into play. All of my current plays are under risk-adjusted position sizes, but my long term holdings are just sitting without any stops on them. With everyone going crazy on RobinHood these days, I should probably put some protections in place in case there’s another lockdown related pullback.

Yesterday, a client’s laptop failed, and I’m waiting on a vendor to go out there and swap a motherboard or something. The drive is encrypted, and while I’m certain I have the keys, I felt a shot of adrenaline course through my body when I remembered that I neglected to reinstall a backup program on her machine after replacing it. So I know what I’m doing today. What I don’t know is what I’m posting tomorrow for my newsletter. This post has been the type of rambling morning pages post that’s of no use to anyone but myself, and which is not the type of quality content that I want to be sending out to my LinkedIn network, or to the email list which I just salvaged from an old CSV file.

I’m going to let that one mull in my head today, and let it stew.

Genius dad

So this is a late post for me today. Woke up the same time as the kids, and forgot to turn my phone on DND before I started meditating, and got a text in the middle of it about an outage at one of Zombie, Inc.’s cornerstone clients. So I felt obliged to take it, and the morning was just shot from there. The day actually improved from there, even though I wasn’t as productive as I wanted to be. So here I am, trying to finish what is for me one of the most important parts of my day. I finished meditating after I put the kids to bed, and I want to put down some thoughts before I get to work on coding. No TV today.

I had a good day with the kids. One of their friends came and knocked on the door. They hadn’t seen her in several weeks, so I let them out for some socially distanced bike riding. I chaperoned. Then after lunch Younger and I took a ride to the pier nearby. She’s totally comfortable on her new pedal bike, only took four or five days. I’m so proud, she’s not even four yet and riding a pedal bike; didn’t even need training wheels. I feel like genius dad.

Elder had a good day also. She had a nine-thirty call with the gifted teacher, which probably broke up our routine for the better. She has this idea that she’s been bringing up for the past couple days about turning the house into a hair salon. I’m trying to humor her but explain the reality about what that really would mean. We also discussed writing a book. She came up with this idea for a story called “Cave of Gold” or “Treasure of Gold”. Her description of it sounds like The Goonies, which we watched over the weekend. I told her the most important thing about making it happen was getting it out of her head and into the real world. We discussed typing it, writing it, and I even showed her some voice dictation options, both on my iPhone and a electronic voice recorder that I have. She wound up writing a scene just before bed. It was a dialog between a mom and an older sibling being asked to take care of their little sibling. Sounded like something right out of our house. Seems like she’s already learned the rule, “write what you know”.

Getting work during the day is hard, though. There’s the distractions from the kids that make most deep work impossible. By the time I actually have the time in the afternoon, my energy is dead. I moved the needle on a few small tasks: ongoing domain migration woes from a crappy reseller; and got a copy of my resume added to my CV site and made a few edits. I’ve got no excuses to start applying now.

I’ve started refactoring my value averaging code. The main function is a hundred lines long, and there’s no tests, so I’m going to to spend some more time on that again today. I run it every day when the market opens. I’m having some problems with it. I give it a list of positions to process, and it takes each one and goes through several steps of calculations before sending a buy or sell order to the exchange. Some of the positions are failing and I’m not sure why, so I’ve got to decouple several of the functions so that I can debug it better. After that I need to pull it out of the package that it’s in and make it a separate library. Right now it’s in my trade plan library, which has turned into a bit of a junk drawer over the past year or so. It’s also tightly coupled to the TDAmeritrade brokerage, and that needs to be abstracted out at some point. I’m getting ahead of myself though.

Tomorrow, I want to get up by six so I can get my meditation and writing done. After the kids are settled in and I’ve done all my morning checks for Zombie, I’m going to focus on the software design pilot project I’m working on there. Then the afternoon, I want to find the best job posted on LinkedIn and apply for it. We’re going to make this happen.

Templates, makefiles, and YAML, oh my!

When I first started programming, it was simple to just fire up an editor and start typing away. Scripts usually wound large procedural monstrosities. If I managed to get anything working it usually was such a mess that it quickly became unmanageable. Now days, there’s so much setup that needs to be done before I can even get to work: setting up a git repo and Python virtual environment; external repo; databases, setting up my IDE. I suppose it must be indicative of the progress I’ve made as a programmer.

One of my final classes is a multi-semester group project. We spent last semester building out a the design docs, and are spending the first few weeks of this one refining those docs individually before coming back together and deploying a prototype. I’m the old man on the team, about twice as old as the rest, and I’ve been doing this long enough to have very strong opinions about a lot of things, so I’ve been trying to guide the team toward these standards.

I’m not going to get into the use case around our app yet, but convinced the team that we would use Django for the backend. Now, while we could use it for the front end as well, I figured that since Django was giving us most of what we needed for the core functionality, we could spend some resources trying out some cutting edge tech that would give the team some experience with GraphQL and React Native. I’ve got no idea whether that will make it in the final approach. Even though we’ve got a team of six people and I’m handling all of the infrastructure stuff, I’m starting to wonder whether the others are going to be able to implement those new features in time.

I’ve got a few more passes through my individual paper to make, then I’ll start focusing on the prototype presentation. My professor made a comment during last week’s recitation that these applications did not have ‘cookie cutter’ approaches, and I almost laughed out loud cause we’re literally using Cookie Cutter Django as the basis of our project. I’m debating whether I want to try a live demo of deploying one or do it offline and record screenshots or something.

Being able to use something like Cookiecutter to setup a Python package, with all the unit testing, CI, and documentation setup via make make commands, out of the box, is amazing once you understand what all of that stuff actually does. It can lead to a bit of choice paralysis at first, trying to figure out testing frameworks, code coverage tools, linters and all that. I’m still getting there. But once found, it makes rapid prototyping easy.

It’s almost maddening thinking about how many different ways there are to setup your workflow. I’m currently using Pipenv as my tool of choice, but recently read about Poetry, which seems to be a step up in many ways. For now though, I’m not chasing it down. Instead, I’m going to focus on delivering something using the tools I already have, instead of getting caught up in what’s new. It’s lesson that continues to be more and more relevant as I mature in my abilities.

Gaussian Elimination with TDD C++, Part 1

I’m pretty pleased with myself. I managed to pull an epic coding session Friday night and met the deadline on a school assignment. I almost used the word finished there, but I merely got it working. As Uncle Bob said, getting it working is the first step to completion, refactoring and cleaning it up is the next.

The purpose of the assignment was to implement a Guassian Elimination function in C++. The professor, an old Fortran/C++ veteran who had done a lot of scientific matrix work back in the day, wanted us to use pointer pointers for the matrix rows, to make swapping rows faster. They gave us the following specification of how the Matrix would be represented in a data file:

3 // int N representing the size of the matrix A
1 1 1 // values of A[row i]
0 1 1 // A[i+1]
0 0 1 // A[i-N]
1 1 1 // right hand side

The professor then went through the algorithm for solving such a matrix on the board. Later they showed us how to generate datafiles with solvable problems for testing, but we’ll skip over that for now.

The example that the professor did in class was a bit of a mess, so I went looking for better examples. Rosetta Code has examples of Gaussian Elimination in many different programming languages. The C version is pretty close, but even looking at the gauss_eliminate function here, we can see that it’s doing a lot and can further be broken down into smaller functions.

void gauss_eliminate(double *a, double *b, double *x, int n)
{
#define A(y, x) (*mat_elem(a, y, x, n))
    int i, j, col, row, max_row,dia;
    double max, tmp;
 
    for (dia = 0; dia < n; dia++) {
        max_row = dia, max = A(dia, dia);
 
        for (row = dia + 1; row < n; row++)
            if ((tmp = fabs(A(row, dia))) > max)
                max_row = row, max = tmp;
 
        swap_row(a, b, dia, max_row, n);
   
        for (row = dia + 1; row < n; row++) {
            tmp = A(row, dia) / A(dia, dia);
            for (col = dia+1; col < n; col++)
                A(row, col) -= tmp * A(dia, col);
            A(row, dia) = 0;
            b[row] -= tmp * b[dia];
        }
    }
    for (row = n - 1; row >= 0; row--) {
        tmp = b[row];
        for (j = n - 1; j > row; j--)
            tmp -= x[j] * A(row, j);
        x[row] = tmp / A(row, row);
    }
#undef A
} 

My experience with C++ has been limited, mostly schoolwork with CodeBlocks and Eclipse; I prefer using JetBrains these days. And I’ve never done tests in it, so after I set up a new repo the first thing I did was spent some time figuring out Google Test before I wrote my first line of code. I started with making sure I could load files, then started writing output helpers, overloading the ostream operator and creating a print() function.

Let me say: Test Driven Development is HARD. It requires a lot of thought up front about what it is that you are trying to do. I started off with a todo list:

- call GaussianElimination function
- read file from file system
- get size from file
- create matrix(size)
- load vector data from file
- create 2d vector array size N
- initialize matrix with values 

and started working through each of them, going through the red light/ green light cycle: writing a test that would fail, then implementing the code that would make that test pass — and NOTHING MORE. Like any discipline, it’s hard. But the effect is amazing. Having a magic button that lets you change code and get [ PASSED ] back after is exhilarating.

I’ll admit that I cheated a bit as the deadline approached. I hadn’t implemented proper comparison operators for the Matrix class, so I was doing everything by eyeball before I got to the point where the code worked and I could submit for credit. The result was still a far cry from the way I usually operate, with a bunch of manually entered code.

I’ll share more in a further post.

Learning to fly

I’ve been on a bit of a kick on Robert C. Martin’s work lately. Martin, AKA “Uncle Bob” is the author of several books on coding and is the author of a couple of classics in the software development field. I’ve watched several of his lectures on YouTube recently, and have been reading through Clean Code the last couple days. It’s really making me realize how garbage the things I’ve been writing lately are, and I’m pressed with an immense urge to go back and completely refactor everything that I’ve been working on the past few weeks.

Of course, having a robust integration test suite is absolutely necessary for any kind of refactoring, which is not something I’ve been terribly disiplined about recently. I’m proud to say that I am taking a strict TDD approach to my latest class assignment in C++, although it has slowed me a great deal. The hardest part is determining how to right tests. Sure I could go and write a massive 200-line function that would take input and perform the Gaussian Elimination on it, but since this is part of a larger test suite that we’ll use for our final exams, I want to make the code more modular. For example, see the difference between this big 75 line single main statement, and this one. The latter could still be broken out to smaller functions according to Uncle Bob, but is still a step in the right direction.

There were two reasons that I went back to school to finish my degree. The first was that I thought I needed a BS after my name in order to get my resume past some of the gatekeeping algorithms at some firms. I’ve since come to the realization that I have no desire to go to work at any large enterprise or other organization where this would be a factor — six figures be damned. The second was that I felt like I was running into roadblocks with my own development projects. They were basically huge convoluted procedural things. Even when I tried to adopt OOO principles, they were still a mess. I felt like I needed to go back to school and go through the curriculum to get where I needed to get.

I don’t think it’s quite worked out the way I wanted it to. Now, don’t get me wrong, I think earning a degree in ‘Computer Science’ has been valuable, but it’s not quite what I expected. I think one of the intro Unix classes really broke my block when it comes to working with Linux, and that’s a skill that I have definitely appreciated. But I think the focus on Java and C++ is behind the times.

I recently had a conversation with one of my professors about why I was surprised that there hadn’t been any focus on Software Design patterns. (I’m still working my way through the Gang of Four.) He told me that there was a bit of disagreement within the department between those who wanted to focus on theory, and those who wanted more actual engineering and development. So far, the balance of power lay with the theoretical side, which is why the focus on the maths, big-O notation, data structures and discrete-finite-automata.

Even so, I’m still surprised that I feel like I’ve taken more out of a couple of 30 year old videos on Lisp than I have out of the classes that I’m going $20K+ in debt for. All I wanted to do was to write better code, so that I can make programs do what I want them to do. The ideas that I’ve had for things were beyond my grasp to complete them, and I was looking for ways to increase my knowledge. I’m probably being unfair to the university, since some of the more business-end document writing (requirements, software specification documents, use cases, &c..) have helped me already in some of my professional interactions.

At the end of the day, it’s about sitting down with an IDE and writing those magic lines of code that make the computer do what I want.

Programmer Discipline

So my productivity has been shot to hell the last two days while I try to familiarize and setup not one, but two new programming environments. I’ve got Javascript for the CCXT/Safe.Trade library, and just got assigned a C++ module for one of my classes.

I have a somewhat convoluted setup. I like to work from one of two machines. My desktop is for gaming and personal or school projects, and my laptop has a Windows VM that I use for my day job. I also have an Ubuntu server that I’m running a file share and other services on. It’s got Docker running over ssh, but I was pounding my head today trying to figure out how to get IntelliJ to talk to it so I could use the integrated run tools instead of the copy/paste garbage I’ve been dealing with as I try to catch up on 20 years of Javascript changes and Node.

For one of my final classes I’ve got to implement Gaussian Elimination in C++ as part of a larger library that will be part of my final grade. I said goodby to CodeBlocks and Eclipse a while back, but I haven’t started a project in C++ in years. The only time I looked at it all has been for the PennyKoin updates. I’ve never spent the time to understand make lists and linking, so I just spent a painful hour trying to get Googletest integrated with this new project. Cause of course I’m going to write a test before I put down anything more complicated than ‘hello world’.

Of course I am.

I’ve spent the last week going over a series of videos on Clean Code by Uncle Bob C. Martin. It’s a good one that I really enjoyed. Martin is really good up on stage — and funny — and I was disappointed when I finished the last one and realized that there weren’t any more. There’s much more on his CleanCoder site for sale that I might dive into, but I want to read his Clean Code and Clean Architecture books first.

Highly recommended if you have several hours to spare.

I came to realize that the tests that I wrote for the GBTC Estimator were too tightly coupled to the module code, and that the module code was coupled to the input (IEX via Pandas DataReader class. So I’ve been trying to decouple it so that so that it works with a dataframe from my broker’s API. I’m taking some hints from a mocking talk I saw that made me realize that I need to break out dependencies even more.

Automating value average stock investing

I spent most of the winter break working on automating a value averaging algorithm that I wrote about several months ago. Back in October we started scaling into three positions that we identified based on our work with some predictions we did using Facebook’s Prophet earlier. My goal was to develop a protocol and work out any kinks in the process manually while I worked on building out code that would eventually take over. While I’m not ready to release the modules to the public yet, I have managed to get the general order calculation and order placement up and running.

To start, I setup a Google Sheet with the details of each position: start date, number of days to run, and the total amount to invest. I used Alexander Elder’s Two Percent Rule, as usual to come up with this number. Essentially each position would be small enough that I wouldn’t need to setup stop losses. From there, the sheet would keep track of the number of business days (as a proxy for trading days) and would compute the target position size for that day. I would update a cell with the current instrument price, and the sheet would compute whether my asset holding was above or below the target, and calculate the buy or sell quantities accordingly.

After market open, I would update the price for each stock and put in the orders for each position. This took a few minutes each day, and became part of my morning routine over the past two months or so. Ideally, this process should have only taken five minutes out of my day, but we ran into some challenges due to the decisions we made that required us to rework things and audit our order history several times.

The first of these was based around the type of orders we placed. I decided that I didn’t want to market buy everything, and instead put ‘good-until-cancelled’ limit orders in. When there was no spread between the bid and the ask, I would just match whichever end I was on, and if there was a split I would put my order price one penny in the spread. As a result, some orders would go unfilled, and required some overly complicated spreadsheet calculations to keep track of which orders were filled, what my actual number of shares was ‘supposed’ to be, and so on. I also started using a prorated target, based on the number of days with actual filled orders. This became a problem to track. Also, some days there were large spreads, and my buy orders were way lower than anything that would get filled. There were times when the price fell for a few days and picked up some of these, but keeping track of these filled/unfilled orders was a huge pain in the butt.

One of the reasons that it took me so long to develop a working product was due to the challenges I had with existing Python support for my brokerage. The only feasible module that I could find on Pypi had basic functionality and required a lot of work. It had no order-placing capabilities, so I had to write those. I also got lost working through Ameritrade’s non-compliant schema definitions, and I almost gave up hope entirely when I found out that they were getting bought out. The module still has a lot of improvements needed before it can be run in a completely automated manner, but more on that later.

So far I’ve got just under a thousand lines of code — not as many tests as I should have written — that allows me to process a list of positions, tuples with stock ticker, days to run, start date, and total capital to invest. It calculates the ideal target, gets the current value of the position, and then calculates the difference and number of shares to buy or sell. It then places the order. I’m still manually keeping an eye on things and tracking my orders in the sheet as I’ve been doing, but there’s too much of a discrepancy between the Python algorithm and my spreadsheet. I don’t anticipate trying to wade through my transaction history to try to program around all of the mistakes and adjustments that I made during the development process. I’ll just have to live without the prorated targets for the time being.

I think priorities for the next few commits will be improving the brokerage module. Right now it requires Chromedriver to generate the authentication tokens; this can be done using straight up request sessions. There’s also no error checking; session expiration is a common problem and I had to write a function to use to refresh it without reauthentication. So first priority will be getting the the order placement calls and token handling improvements put in and a PR back into the main module.

From there, I’d like to clean up the Quicktype-generated objects and get them moved over to the brokerage package where they belong. I don’t know that most people are going to want to use Python objects instead or dictionaries, but I put enough work into it that I want it out there.

Lastly, I’ll need to figure out how to separate any of the broker-specific function calls from the value averaging functions. Right now it’s too intertwined to be used for anything other than my brokerage, so I’ll see about getting it generalized in such a way that it can be used with Tensortrade or other algorithmic trading platforms.

I’m not sure how much of this I can get done over the spring. Classes for my final semester at school start next Monday, and it will be May before I’m done with classes. But I will keep posting updates.

QuickType and Ameritrade’s API.

My life goal of automating my job out of existence continues unabated. I’ve been spending a lot of time dealing with the APIs of the various vendors that we deal with, and I’ve spent a lot of time pouring over JSON responses. Most of these are multi-level structures, and usually leads to some clunky accessor code like object['element']['element']. I much rather prefer the more elegant dot notation of object.element.element instead, but getting from JSON to objects hasn’t been something I’ve wanted to spend much time on. Sure, there are a few options to do this using standard Python, but QuickType is by far the best solution out there.

I’ve been using the web-based version for the past few days to create an object library for Ameritrade’s API. Now first off, I’m probably going overboard and violating YAGNI (you ain’t gonna need it) principles by trying to include everthing that the API can return, but it’s been a good excuse to learn more about JSON schemas.

JSON schema with resultant Python code on right.

One of the things that I wish I’d caught earlier is that the recommended workflow in Quicktype is to start with example JSON data, and convert it to a JSON schema before going from that schema to your target language. I’d been trying to go straight from JSON to Python, and there were some problems. First off, the Ameritrade schema has a lot more types than I’ll need: there are two subclasses of securities account, and 5 different ones for the various instrument class. I only need a small subset of that, but thankfully Quicktype automatically combines these together. Secondly, Ameritrade’s response summary, both the schema and the JSON examples, aren’t grouped together in a way that can be parsed efficiently. I spent countless hours trying to combine things into a schema that is properly referenced and would compile properly.

But boy, once it did. Quicktype does a great job of generating code that can process JSON into a Python object. There are handlers for all of the various data types, and Quicktype will actually type check everything from ints to lists, dicts to unions (for handling Nones), and will process classes back out to JSON as well. Subobject parsing works very well. And even if you don’t do Python, it has a an impressive number of languages that it outputs to.

One problem stemming from my decision to use Ameritrade’s response summary JSON code instead of their schema is that the example code uses 0 instead of 0.0 where a float would be applicable. This led to Quicktype generating it’s own schema using integers instead of the JSON schema float equivalent, number. Additionally, Ameritrade doesn’t designate any properties as required, whereas Quicktype assumes everything in your example JSON is, which has led to a lot of failed tests.

Next, I’ll likely figure out how to run Quicktype locally via CLI and figure out some sort of build process to use to keep my object code in sync with my schema definitions. There’s been a lot of copypasta going on the past few days, and having it auto update and run tests when the schema changes seems like a good pipeline opportunity. I’ve also got to spend some more time understanding how to tie together complex schema. Ameritrade’s documentation isn’t up to standard, so figuring out to break them up into separate JSON objects and reference them efficiently will be crucial if I’m going to finish converting the endpoints that I need for my project.

That said, Quicktype is a phenomenal tool, and one that I am probably going to use for other projects that interface with REST APIs.

Stock price forecasting using FB’s Prophet: Part 3

In our previous posts (part 1, part 2) we showed how to get historical stock data from the Alpha Vantage API, use Pickle to cache it, and how prep it in Pandas. Now we are ready to throw it in Prophet!

So, after loading our main.py file, we get ticker data by passing the stock symbol to our get_symbol function, which will check the cache and get daily data going back as far as is available via AlphaVantage.

>>> symbol = "ARKK"
>>> ticker = get_symbol(symbol)
./cache/ARKK_2019_10_19.pickle not found
{'1. Information': 'Daily Prices (open, high, low, close) and Volumes', '2. Symbol': 'ARKK', '3. Last Refreshed': '2019-10-18', '4. Output Size': 'Full size', '5. Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Simple Moving Average (SMA)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern'}
{'1: Symbol': 'ARKK', '2: Indicator': 'Relative Strength Index (RSI)', '3: Last Refreshed': '2019-10-18', '4: Interval': 'daily', '5: Time Period': 60, '6: Series Type': 'close', '7: Time Zone': 'US/Eastern Time'}
./cache/ARKK_2019_10_19.pickle saved

Running Prophet

Now we’re not going to do anything here with the original code other than wrap it in a function that we can call again later. Our alpha_df_to_prophet_df() function renames our datetime index and close price series data columns to the columns that Prophet expects. You can follow the original Medium post for an explanation of what’s going on; we just want the fitted history and forecast dataframes in our return statement.

def prophet(ticker, fcast_time=360):
    ticker = alpha_df_to_prophet_df(ticker)
    df_prophet = Prophet(changepoint_prior_scale=0.15, daily_seasonality=True)
    df_prophet.fit(ticker)
    fcast_time = fcast_time
    df_forecast = df_prophet.make_future_dataframe(periods=fcast_time, freq='D')
    df_forecast = df_prophet.predict(df_forecast)
    return df_prophet, df_forecast

>>> df_prophet, df_forecast = prophet(ticker)
Initial log joint probability = -11.1039
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      99       3671.96       0.11449       1846.88           1           1      120   
...
    3510       3840.64   3.79916e-06       20.3995   7.815e-08       0.001     4818  LS failed, Hessian reset 
    3534       3840.64   1.38592e-06       16.2122           1           1     4851   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance

The whole process runs within a minute. Even twenty years of Google daily data can be processed quickly.

The last thing we want to do is concat the forecast data back to the original ticker data and Pickle it back to our file system. We rename our index back ‘date’ as it was before we modified it, then join it to the original Alpha Vantage data.

def concat(ticker, df_forecast):
    df = df_forecast.rename(columns={'ds': 'date'}).set_index('date')[['trend', 'yhat_lower', 'yhat_upper', 'yhat']]
    frames = [ticker, df]
    result = pd.concat(frames, axis=1)
    return result

Seeing the results

Since these are Pandas dataframes, we can use matplotlib to see the results, and Prophet also includes Plotly support. But as someone who looks at live charts in TradingView throughout the day, I’d like something more responsive. So we loaded the Bokeh library and created the following function to match.

ARKK plot using matplotlib. Static only.
ARKK plot in Plotly. Not great. UI is clunky and doesn’t work well in my dev VM browser.
def prophet_bokeh(df_prophet, df_forecast):
    p = figure(x_axis_type='datetime')
    p.varea(y1='yhat_lower', y2='yhat_upper', x='ds', color='#0072B2', source=df_forecast, fill_alpha=0.2)
    p.line(df_prophet.history['ds'].dt.to_pydatetime(), df_prophet.history['y'], legend="History", line_color="black")
    p.line(df_forecast.ds, df_forecast.yhat, legend="Forecast", line_color='#0072B2')
    save(p)

>>> output_file("./charts/{}.html".format(symbol), title=symbol)
>>> prophet_bokeh(df_prophet, df_forecast)
ARKK plot in Bokeh. Can easily zoom and pan. Lovely.

Putting it all together

Our ultimate goal here is to be able to process large batches of stocks, downloading the data from AV and processing it in Prophet in one go. For our initial run, we decided to start with the bundle of stocks in the ARK Innovation ETF. So we copied the holdings into a Python list, and created a couple of functions. One to process an individual stock, an another to process the list. Everything in the first function should be familiar except for two things. One, we added a check for the ‘yhat’ column to make sure that we didn’t inadvertently reprocess any individual stocks while we were debugging. We also refactored get_filename, which just adds the stock ticker plus today’s date to a string. It’s used in get_symbol during the Alpha Vantage call, as well as here when we save the Prophet-ized data back to the cache.

def process(symbol):
    ticker = get_symbol(symbol)
    if 'yhat' in ticker:
        print("DF exists, exiting")
        return
    df_prophet, df_forecast = prophet(ticker)
    output_file("./charts/{}.html".format(symbol), title=symbol)
    prophet_bokeh(df_prophet, df_forecast)
    result = concat(ticker, df_forecast)
    file = get_filename(symbol, CACHE_DIR) + '.pickle'
    pickle.dump(result, open(file, "wb"))
    return

Finally, our process_list function. We had a bit of a wrinkle at first. Since we’re using the free AlphaVantage API, we’re limited to 5 API calls per minute. Now since we’re making three in each get_symbol() call we get an exception if we process the loop more than once in sixty seconds. Now I could have just gotten rid of the SMA and RSI calls, ultimately decided just to calculate the duration of each loop and just sleep until the minute was up. Obviously not the most elegant solution, but it works.

def process_list(symbol_list):
    for symbol in symbol_list:
        start = time.time()
        process(symbol)
        end = time.time()
        elapsed = end - start
        print ("Finished processing {} in {}".format(symbol, elapsed))
        if elapsed > 60:
            continue
        elif elapsed < 1:
            continue
        else:
            print('Waiting...')
            time.sleep(60 - elapsed)
            continue

So from there we just pass our list of ARKK stocks, go for a bio-break, and when we come back we’ve got a cache of Pickled Pandas data and Bokeh plots for about thirty stocks.

Where do we go now

Now I’m not putting too much faith into the results of the Prophet data, we didn’t do any customizations, and we just wanted to see what we can do with it. In the days since I started writing up this series, I’ve been thinking about ways to calculate the winners of the plots via a function call. So far I’ve come up with this discount function, that determines the discount of the current price of an asset relative to Prophet’s yhat prediction band.

Continuing with ARKK:

def calculate_discount(current, minimum, maximum):
    return (current - minimum) * 100 / (maximum - minimum)

>>> result['discount'] = calculate_discount(result['4. close'], result['yhat_lower'], result['yhat_upper'])
>>> result.loc['20191016']
1. open           42.990000
2. high           43.080000
3. low            42.694000
4. close          42.800000
5. volume     188400.000000
SMA               44.409800
RSI               47.424600
trend             41.344573
yhat_lower        40.632873
yhat_upper        43.647911
yhat              42.122355
discount          71.877276
Name: 2019-10-16 00:00:00, dtype: float64

A negative number for the discount indicates that the current price is below the prediction band, and may be a buy. Likewise, anything over 100 is above the prediction range and is overpriced, according to the model. We did ultimately pick two out of the ARKK holding that were well below the prediction range and showed a long term forecast, and we’ve started scaling in modestly while we see how things play out.

If we were more cautious, we’d do more backtesting, running limited time slices through Prophet and comparing forecast accuracy against the historical data. Additionally, we’d like to figure out a way to weigh our discount calculation against the accuracy projections.

There’s much more to to explore off of the original Medium post. We haven’t even gotten into integrating Alpha Vantage’s cryptoasset calls, nor have we done any of the validation and performance metrics that are part of the tutorial. It’s likely a part 4 and 5 of this series could follow. Ultimately though, our interest is to get into actual machine learning models such as TensorFlow and see what we can come up with there. While we understand the danger or placing too much weight into trained models, I do think that there may be value to using these frameworks as screeners. Coupled with the value averaging algorithm that we discussed here previously, we may have a good strategy for long-term investing. And anything that I can quantify and remove the emotional factor from is good as well.


I’ve learned so much doing this small project. I’m not sure how much more we’ll do with Prophet per se, but the Alpha Vantage API is very useful, and I’m guessing that I’ll be doing a lot more with Bokeh in the future. During the last week I’ve also discovered a new Python project that aims to provide a unified framework for coupling various equity and crypto exchange APIs with pluggable ML components, and use them to execute various trading strategies. Watch this space for discussion on that soon.

Stock price forecasting using FB’s Prophet: Part 2

Facebook’s Prophet module is a trend forecasting library for Python. We spent some time over the last week going over it via this awesome introduction on Medium, but decided to do some refactoring to make it more reusable. Previously, we setup our pipenv virtual environment, separated sensitive data from our source code using dotenv, and started working with Alpha Vantage’s stock price and technical indicator API. In this post we’ll save our fetched data using Pickle and do some dataframe manipulations in Pandas. Part 3 is also available now.

Pickling our API results

When we left off, we had just wrote our get_time_series function, to which we pass 'get_daily' or such and a symbol for the stock that we would like to retrieve. We also have our get_technical function that we can use to pull any of the dozens of indicators available through Alpha Vantage’s API. Following the author’s original example, we can load Apple’s price history, simple moving average and RSI using the following calls:

symbol = 'AAPL'
ticker = get_time_series('get_daily', symbol, outputsize='full')
sma = get_technical('get_sma', symbol, time_period=60)
rsi = get_technical('get_rsi', symbol, time_period=60)

We’ve now got three dataframes. In the original piece, the author shows how you can export and import this dataframe using Panda’s .to_csv and read_csv functions. Saving the data is a good idea, especially during this stage of development, because it allows us to cache out data and reduce the number of API calls. (Alpha Vantage’s free tier allows 5 calls per minute, 500 a day. ) However, using CSV to save Panda’s dataframes is not recommended, as you will use index and column data. Python’s Pickle module will serialize the data and preserve it whole.

For our implementation, we will create a get_symbol function, which will check a local cache folder for a copy of the ticker data and load it. Our file naming convention uses the symbol string plus today’s date. Additionally, we concat our three dataframes into one using Pandas concat function:

def get_symbol(symbol):
    CACHE_DIR = './cache'
    # check if cache exists
    symbol = symbol.upper()
    today = datetime.now().strftime("%Y_%m_%d")

    file = CACHE_DIR + '/' + symbol + '_' + today + '.pickle'
    if os.path.isfile(file):
        # load pickle
        print("{} Found".format(file))
        result = pickle.load(open(file, "rb"))
    else:
        # get data, save to pickle
        print("{} not found".format(file))
        ticker = get_time_series('get_daily', symbol, outputsize='full')
        sma = get_technical('get_sma', symbol, time_period=60)
        rsi = get_technical('get_rsi', symbol, time_period=60)

        frames = [ticker, sma, rsi]
        result = pd.concat(frames, axis=1)
        pickle.dump(result, open(file, "wb"))
        print("{} saved".format(file))
    return result

Charts!

The original author left out all his chart code, so I had to figure things out on my own. No worries.

result = get_symbol("goog")
plt.plot(result.index, result['4. close'], result.index, result.SMA, result.index, result.RSI)
plt.show()
Google stock price (blue), 20-day moving average (orange) and RSI (green)

Since the RSI is such a small number relative to the stock price, let’s chart it separately.

    plt.subplot(211, title='Price')
    plt.plot(result.index, result['4. close'], result.index, result.SMA)
    plt.subplot(212, title="RSI")
    plt.plot(result.index, result.RSI)
    plt.show()
Much better.

We saved both of these in a plot_ticker function for reuse in our library. Now I am no expert on matplotlib, and have only done some basic stuff with Plotly in the past. I’m probably spoiled by looking at TradingView’s wonderful chart tools and dynamic interface, so being able to drag and zoom around in the results is really important to me from a usability standpoint.

Now I am no expert on matplotlib, and have only done some basic stuff with Plotly in the past. I’m probably spoiled by looking at TradingView’s wonderful chart tools and dynamic interface, so being able to drag and zoom around in the results is really important to me from a usability standpoint. So we’ll leave matplotlib behind from here, and I’ll show you how I used Bokeh in the next part.

Framing our data

We already showed how we concat our price, SMA and RSI data together earlier. Let’s take a look at our dataframe metadata. I want to show you the columns, the dtype of those columns, as well as that of the index. Tail is included just for illustration.

>>> ticker.columns
Index(['1. open', '2. high', '3. low', '4. close', '5. volume', 'SMA', 'RSI'], dtype='object')

>>> ticker.dtypes
1. open      float64
2. high      float64
3. low       float64
4. close     float64
5. volume    float64
SMA          float64
RSI          float64
dtype: object

>>> ticker.index
DatetimeIndex(['1999-10-18', '1999-10-19', '1999-10-20', '1999-10-21',
               '1999-10-22', '1999-10-25', '1999-10-26', '1999-10-27',
               '1999-10-28', '1999-10-29',

>>> ticker.tail()
            1. open  2. high  3. low  4. close   5. volume       SMA      RSI
date                                                                         
2019-10-09   227.03   227.79  225.64    227.03  18692600.0  212.0238  56.9637
2019-10-10   227.93   230.44  227.30    230.09  28253400.0  212.4695  57.8109

Now we don’t need all this for Prophet. In fact, it only looks at two series, a datetime column, labeled ‘ds’, and the series data that you want to forecast, a float, as ‘y’. In the original example, the author renames and recasts the data, but this is likely because of the metadata loss when importing from CSV, and isn’t strictly needed. Additionally, we’d like to preserve our original dataframe as we test our procedure code, so we’ll pass a copy.

def alpha_df_to_prophet_df(df):
    prophet_df = df.get('4. close')\
        .reset_index(level=0)\
        .rename(columns={'date': 'ds', '4. close': 'y'})

    # not needed since dtype is correct already
    # df['ds'] = pd.to_datetime(df['ds'])
    # df['y'] = df['y'].astype(float)
    return prophet_df

>>> alpha_df_to_prophet_df(ticker).tail()
             ds       y
5026 2019-10-09  227.03
5027 2019-10-10  230.09
5028 2019-10-11  236.21
5029 2019-10-14  235.87
5030 2019-10-15  235.32

In the first line of prophet_df =we’re selecting only the ‘close’ price column, which is returned with the original DateTimeIndex. We reset the index, which makes this into a ‘date’ column. Finally we rename them accordingly.


And that’s it for today! Next time we will be ready to take a look at Prophet. We’ll process our data, use Bokeh to display it, and finally write a procedure which we can use to process data in bulk.