I don’t want to get too tied up in the usual diary-journaling and talk about how horrible I slept last night, instead I actually have the beginnings of a longer-form post about Solana and data-oriented programming. I’m not sure how much I can say quite yet as I’m just breaking the surface of things and am just forming my ideas.
The old joke about blockchains as really slow databases is funny because it’s true. Solana, however has always been focused on providing enterprise-level performance to the network. It’s got a long ways to go, from a stability and performance standpoint, but I think the core team and the community are doing a great job so far.
I’ve never been a programmer, per se, the best I can say for myself is that I’ve known enough about various programming languages over the years to be dangerous. My demonstrative knowledge is limited to various procedural scripts and smaller OOP type of programs written in Python, JS, bash, or dare I say, Powershell.
One of the benefits of being on the Star Atlas team, my first at an actual software/game development company, is that I’m surrounded by a ton of really smart people, and our tech leads are really solid. I recently asked them if they considered what we were doing as OOP, or if there was some other term that they would use to describe what we were doing. The phrase data-oriented design was bandied about by a couple people, and that sent me down my current path or research.
I’m currently reading Richard Fabian’s Data-Oriented Design book online, and while I’m only a couple chapters in right now I have also been reading up on various Entity Component System, which are basically data-oriented systems, somewhat popular in game development. If you want a primer on OOP vs. ECS, this 2018 RustConf keynote by Katherine West does a pretty good idea explaining it.
Basically, OOP ties together data and behavior into classes, whereas ECS keeps them separate. Entities are basically little more than an ID, and these are linked to various data-only structs. Behavior is kept entirely separate in systems, which simply perform read and write operations on various numbers of these components. The simplest example I’ve seen is a physics engine that calculates velocity. There are components for position, heading, and speed, and the velocity system iterates through every entity that has these three components, reads heading and speed of each one, then calculates the new positions based on how much time has elapsed.
The advantages of this data-oriented approach are numerous from a design and implementation standpoint, and there are also many performance benefits that allow pipelining and tend to reduce cache misses. Since these systems are operating on homogenous data types, they can be packed together more efficiently and can be processed synchronously.
The parallels with Solana are numerous, and to me, it was not apparently obvious coming into this space why Solana’s programming paradigm felt so different and unusual. For the uninitiated, Solana programs are stateless, and operate on a number of behavior-less accounts, which contain state only. These accounts aren’t limited to holding just one data component, but usually hold all of the state data belonging to a user or other program.
Going back to the blockchain as database analogy, ECS systems operate using a query system, and Fabian spends a lot of time talking about data normalization and relational databases. Now, one interesting thing about Solana programs is that they have no knowledge or ability to find accounts elsewhere on the blockchain. In fact, all accounts needed to perform a computation, that is, all of the data, has to be passed in the program instruction from an offchain client. Basically, one must query or enumerate these accounts using an off-chain client via RPC calls to Solana validators. Then they can be bundled into various program instructions and sent as signed transactions to affect various state changes.
If one considers entities as primary keys in a database, then you might say that the primary keys in Solana are created from pubkeys, whether they belong to a user or another program. The analogy is a bit forced here, but one can combine user pubkey, program keys and a bump seed to generate unique account addresses. These addresses are deterministic, so it’s unnecessary to keep a registry of these addresses. The client simply rehashes the inputs and does a null check against the program derived account. My understanding is a bit weak here, as one can generate all of the accounts owned by a particular program, but this is relatively costly if you’re only dealing with one user.
I’m still fleshing out these ideas, but it’s obvious that Solana is a data-centric system, compared to the EVM in which data and behavior are more closely intertwined. Experienced Solana developers probably implicitly understand this, but for those coming from OOP and more traditional (e.g. university-taught) computer science backgrounds, this comparison between Solana and an ECS might need to be more explicitly stated.
For now, I will continue to explore, and learn, and do some experimentation around data storage, cross program calls, and RPC queries, to benchmark how well Solana holds up when programs are designed in strict single-field data accounts. Using an off-chain client to query and batch process large numbers of these components across user’s may allow us to develop a large-scale game engine that can support a large user base. Whether this is possible remains to be seen.