Well, I’ve made some real progress over the past couple days. JARVIS is operational. Buggy, but operational. I’ve had the Discord bot working, and was able to get it to query GPT and Replicate APIs, but yesterday I gave it a memory.

It works by loading the chat history of a named channel into a buffer. The buffer has a max size, and each message in the history gets added to the buffer, which is a FIFO queue of a Message objects, which have authors and message content. Each time a message is added it checks the queue, and if the number of words in the queue plus the new message exceeds the max, it drops.

So when the bot load, it ingests the message history in the channel. Then when a new message comes along, it gets added to the buffer. That’s the setup. After a new message is added to the queue, I check it to see if it mention the bot. If yes, it sends the entire queue, which I call a context buffer, along to GPT, along with any additional prompts that I desire for prepends or appends.

Early testing yesterday was very promising. Its quirky and weird, but we were really happy with the output. It was very fun.

I also did my first fine tuning experiment. I had a Word doc with some content, a technical paper if you will. I broke the document up by newlines, then added them line by line to a JSONL document with blank prompts. I followed the OpenAI instructions and had the job queued in less than an hour. The queue itself took very long so I forgot about it until late in the evening.

It worked. The immediate problem I had though was that I forgot my stop signals, so the model doesn’t know when to shut up. (That’s a good analogy for my own motormouth, I’m sure.) So I’ll need to retrain it today.

As a proof of concept though, I am extremely pleased. I had been fretting over whether fine-tuning a model was this simple, and it appears that it is. Generating quality content and curating the results is the hard part.

JARVIS is going to be fairly complex system, with multiple models and context systems; hooks into various modality systems (txt2txt, txt2img, speech2txt, &c…). Things are going to get really crazy very fast. Feedback loops are going to be very important here, but this is where the recursive functions are going to get really crazy.

Leave a Reply

Your email address will not be published. Required fields are marked *