Fearless refactoring

C++ is a much more complicated language than I ever imagined. I’d had a little bit of exposure to it earlier in college, and I hated it because of the amount of setup that was required to get it running. We were introduced to CodeBlocks and Eclipse, but both of them just seemed so clunky. Figuring out compiler options, makefiles, and trying to get programs that compiled on my home development Ubuntu workstation and on the schools Windows RDS environment and the professor’s autograder was just just too much. So when I really started diving into Python, it was like coming out from being underwater too long and getting that breath of air.

Working on the Pennykoin Cryptonote codebase got me a bit more comfortable with it. I stil didn’t understand half of what I saw. Half of it was the semantics of the code itself, half of it was just trying to understand the large codebase itself. Eventually I was able to figure out what I was looking for and make the changes that I needed to make. I never really felt comfortable making those changes, and even less so publishing and releasing them. That’s because the Pennykoin codebase had no tests.

I’ve spent the last few days working on some matrix elimination code for my numerical methods class. During class, the professor would hastily write some large, procedural mess to demonstrate Gaussian elimination or Jacobi iteration, and not only did I struggle to understand what (and why) he was doing, but he often ran into problems of his own and we had to debug things during lecture, which I thought was wasteful of class time.

As I’d been on an Uncle Bob kick during that time, I decided I would take a TDD approach to my code, and began what’s turned out to be a somewhat arduous process to abstract and decouple the professors examples into something that had test coverage, and allowed me to follow DRY principles. Did I mention that our base matrix class had to use C-style arrays using pointer pointers? Yes it was a slog. Rather than be able to use iterators through standard library arrays, every matrix operation involves nested for loops. I’ve gone mad trying to figure out what needs dereferencing, and spent far too long tracing strange stack exceptions. (Watch what happens when have an endl; at the end of a print function and call another endl; immediately after calling that function…

I started out working on the Gaussian elimination function, then realized that I needed to pull my left hand side matrix member out as it’s own class. Before I did that I tried to create my own vector function for the right hand side. So I pulled that out, writing tests first. Then I started with my new matrix class. I ran into problems including a pointer array of my vector class. For reasons that I’ll not get into, I kept the C-style arrays. I slowly went through my existing test cases for the Gaussian class, making sure that I recreated the relevant ones in the matrix class. Input and output stream operators, standard array loaders (for the tests themselves), equality, inequality and copy functions were copied or rewritten. After one last commit to assure myself that I had what I needed, I swapped out the **double[] lhs member for matrix lhs, and commented out the code within the relevant Gaussian functions with calls to lhs.swapRows(). Then I ran the tests.

And it worked

Uncle Bob talks about having that button of truth that you can hit to know that the code works, and how it changes the way you develop. I’m not sure if he uses the word fearless, but that’s how it feels. Once the test said OK, I erased the commented code. Commit. Don’t like the name of this function? Shift+F6, rename, test, commit. These two functions have different names, but do the same thing, with different parameter types? Give them the same name and trust the compiler to tell the difference. Test OK? Commit.

It’s quite amazing.

I spent several hours over the past few days working on adding an elementary matrix to the matrix elimination function, and I made various small changes to the code, adding what I needed (tests first!) and making small refactors to make the code clearer. I’ve had to step into the debugger a few times, but it’s going well. There’s still one large function block that I’ve been unable to break down because of some convoluted logic, but I’m hoping to tackle it today before moving on. And I’m confident that no matter what changes I make, I’ll know immediately whether they work or not.

Fair Open Source

Last night I had the pleasure of meeting Travis Oliphant, one of the primary creators of Numpy and and founder of Anaconda. He’s currently the CEO of OpenTeams, a company attempting to change the relationship between open source software and the companies that build on top of it. I found out about the lecture and was interested in it because of an article I had read in Wired about technology’s free rider problem, and went to the event without knowing anything much at all about Mr. Oliphant. I soon found out who he was and was very grateful that I had come. I’ve spent a lot of time using Numpy, and I’ll admit I was a bit starstruck.

Travis’s lecture spawned from his experience working on Numpy. He basically gave up tenure track at Brigham Young University to work on it, and had to find other ways to support his family for the two years that he was working on the initial release. As was noted elsewhere, much of the tech boom over the past 20 years has been built on top of the contributions of FOSS developers like Travis and others. He’s a big believer of profit, and thinks that the lack of financial incentives in the FOSS space has caused several problems, including developer to burnout, leading to a lack of proper maintenance of these projects. Many of these projects, like Numpy, have become crucially important to the scientific and business community.

Tim Oliphant’s Pycon 2019 Lighting Talk about Quansight

Oliphant’s goal is to make open source sustainable. Quansight is a venture fund for companies that rely on OSS, one of the ones they’ve funded is a public benefit corporation called FairOSS, which hopes to support OSS developers through contributions from companies that use OSS. He’s also doing something very similar with OpenTeams, hoping to follow Red Hat’s model of supporting Open Source by providing support contracts for various projects.

These are all very worthy goals, and I was both impressed and inspired by his talk. It’s opened up some interesting career opportunities. I recently took my first developer payment through GitCoin recently, and it was a bit of a rush. Getting paid to work on Open Source Software seems like an awesome opportunity, and I’ll be keeping an eye on this for potential post-graduate plans.

Becoming a Git-xpert

I have been trying to get a grip on the Pennykoin CLI code base for some time. One of the problems that I’ve had is that the original developer had a lot of false starts and stops, and there’s a lot of orphan branches like this:

Taken with GitKraken

If that wasn’t bad enough, at some point they decided to push the current code to a new repo, and lost the entire starting commit history. Whether this was intentional or not, I can’t say. It’s made it very tricky for me to backtrack through the history of the code and figure out where bugs were introduced. So problem number one that I’m dealing with is how to link these two repos together so that I have a complete history to search through.

Merging two branches

So we had two repos, which we’ll call pk_old and pk_new. I originally tried methods where I tried to merge the repos together using branches, but I either wound up with the old repo as the last commit, or with the new repo and none of the old history. I spent a lot of time going over my bash history file and playing with using my local directories as remote sources, deleting and starting over. Then I was able to find out that there was indeed a common commit between these two repos, and that all I had to do was add the old remote with the –tags option to pull in everything.

mkdir pk_redux
cd pk_refresh
git init
git remote add -f pk_new https://github.com/Pennykoin/Pennykoin-old.git --tags
git merge pk_new/master
git remote add -f pk_old https://github.com/Pennykoin/Pennykoin-old.git --tags

Now, I probably could have gotten away by just cloning the pk_new repo instead of initializing an empty directory and adding the remote, but we the end result should be the same. A quick check of the tags between the two original repos and my new one showed that everything was there.

The link between the two repos

Phantom branches

One of the things that we have to do as part of our pk_redux, as we’re calling it, is setup new repos that we actually have control over. This time around, everything will be setup properly as part of governance, so that I’m not the only one with keys to the kingdom in case I go missing. I want to take advantage of GitLab’s integrated CI/CD, as we’ve talked about before, so I setup a new group and pkcli repo. I pushed the code base up, and saw all the tags, but none of the branches were there.

The issue ultimately comes down to the fact that git branches are just pointers to a specific commit in a repository’s history. Git will pull the commits down from a remote as part of a fetch job, but not the pointers to those branches unless I physically checked them out. Only after I created these tracking branches on my local repo could I then push them to the new remote origin.

Fixing Pennykoin

So now that I’ve got a handle on this repo, my next step is to hunt some bugs. I’ll probably have to do some more work to try and de-orphan some of these early commits in the repo history, cause that will be instrumental in tracking down changes to the Cryptonote parameters. These changes are likely the cause for the boostrap issue that exists. And my other priority is figuring out if we can unlock the bugged coins. From there I’d like to implement a test suite, and make sure that there is are proper branching workflows for code changes.

Frustrations

I’m a bit perturbed right now. I went back to Django project I hadn’t worked on in two weeks and could not get my Pycharm interpreter working properly. I’d updated from the Community Edition to the Professional Edition during that time, which I’m not sure had anything to do with it, but this failed session brings me to another source of frustration with things that I need to get off my chest.

There are 3, maybe 4 ways that one might need to interact with a Django app in Pycharm. The first, being the Python console itself. The second, the regular command terminal. Third would be the various run configurations that one can setup. And four would be the Django console that Pycharm Pro enables. My issue is that each of these has their own environment variables settings! Maybe it’s just my inexperience showing through here, but I tend to use several of these when I’m working. I have a run configuration for the test server running, then the Django console for migrations and tests, and a terminal window that’s actually running the Django shell, so that I can muck around with code while I’m figuring things out.

I don’t know if I’m an idiot or what, but it just seems extremely ineffective, and I have got to be missing something.

Working alone

Last weekend I finally got around to reading Two Scoops of Django, and it was very interesting. I wish I had picked it up earlier. I think I first started really delving into the Django framework about 3 months ago or so, and I’ve really enjoyed tinkering around with the models and ORM. I’ve done a bit with the forms and views, but I’ve spent a lot more time trying to draft out some data models for various projects and get a feel for how things work. I’ve fallen into my trap of getting too caught up in tools in order to actually deliver anything yet, but I’ve got two projects that I am primarily working on. I’ve been very disciplined about spending at least an hour or more each day on one of them.

Part of me thinks I should just focus on the one at the exclusion of the other, just to focus and plow through. “Starting is easy, finishing is hard,” as Jason Calacanis says. The other voice in my head is telling me that as long as I’m pushing forward on one of them or the other, it doesn’t matter, since the skills I’m learning on each will translate to the other. The last few days have seemed like my wheels are spinning though, as it seems I spent more time sharpening my ax than I did actually cutting down trees. I spent what feels like two whole days just trying to figure out how to setup cookiecutter-django the way I wanted it, another day or two trying to figure out why pipenv doesn’t work properly in Pycharm, and then another trying to figure out how to get Celery to work. Yesterday it was all about how to properly clone a 3rd party Django app so that I can make some modifications to it. And I’ve spent hours trying to figure out how to do my tests, what needs testing and what doesn’t. Endless hours on Medium reading everything I could find related to any of the above.

But as long as I can sit down and work on something, I tell myself I’m making progress and becoming an actual developer. I’ve talked about discipline previously, and that discipline is paying off with my day job as well, whether it’s Powershell scripts, or more Python API wrappers. The hardest thing about it for me is the solitary nature of what I’m doing. Not having a team or partner with these projects is the hardest, cause it ultimately means that I have no one to bounce ideas off of in real time. Best I can hope for is to dump something out on StackExchange and hope that someone gets back to me. Most of the time, just explaining the question sufficiently enough for someone else to understand it spurs the kind of subconscious creativity that leads to a solution.

There’s been many false starts already, but I’m starting to get there.

Currently, with a fintech app I’m working on, I’m trying to determine how I expand a cryptocurrency wallet app designed for Bitcoin and other assets that use it’s RPC interface. The asset that I’m working with is a fork of a privacy coin with the un-shielded send functionalities disabled. So I’ve got to figure out the simplest method to update all the calls in this library so that they’ll use the shielded commands for this asset while retaining the existing commands for the legacy assets. So far, I’ve decided to try adding a boolean field to the currency model and add an if clause to the Celery tasks to choose between the two based on the boolean. It requires modifying code in each of the various function. While it’s simple, it seems to violate one of the core principals of Django, which is don’t repeat yourself (DRY). It seems to me that there is another way that I can add a decorator or something to each of these functions — maybe a strategy pattern — to do that bit of logic in a way that would make it easier to implement. Maybe even without having to fork the 3rd party app in the first place.

We shall see.

Free JetBrains software for academic developers

So I’m pretty happy cause today I found out that JetBrains is offering free licenses to their entire software library for students and faculty members. I’ve been using PyCharm Community edition for some time now, and am really glad to have access to the Professional version with all the plugin and Django support. I actually purchased a CLion license a year ago or so. They make really good software, and I encourage everyone to check it out.