Experiments with voice first

3rd gen. black Amazon echo dot speaker

Getting used to talking to computers, one word at a time

I’ve been following Brian Roemmele on Twitter and LinkedIn recently, he’s got one of the most interesting feeds I’ve seen, and has access to a lot of archive material, old videos from the early days of tech, and it’s really a delight. He’s really forward thinking, a futurist type, concerned with the evolution of the human mind as it relates to technology. He’s also a big proponent of voice as the primary method of interacting with machines, what he calls voice first. His theory is that voice is the most direct channel to one’s thoughts, more so than writing or typing. It’s an idea that’s been sticking in my head lately.

I grew up on Star Trek, so the idea of speaking commands to a computer has always been attractive, but implementations of it have been lacking. My experiences with Dragon Typing some ten-odd years ago were cumbersome, voice training and high error rates seemed inefficient. I’ve always been a fast typist and preferred to just bang out the words instead. Thinking back, it was usually the lawyers and doctors that preferred the app. I just assumed that it was because they were too “busy” to be bothered with typing things up, and preferred to dictate to their paralegals instead.

Of course when Siri hit the iPhone I played around with it. The occasional voice command was nice for sending brief text messages and such, but I tended to fall back to banging thoughts out with my twin thumbs on the keypad. Lately though, I’ve started relying on it more, mostly to avoid physically picking up my phone and looking at it. Asking it for the weather, or even the price of bitcoin in the morning helps me keep to my habits. The longer I can go without unlocking it, the better.

Watching the kids interact with voice first has been interesting to watch unfold. Younger, who still has a toddler’s lisp, can’t type of course, so watching her talk to Siri to pull things up is usually amusing. It’s also about the only way that she can interact with the FireTV as well, since she hasn’t quite figured out the remote other than to pause and unpause a show. And I’ve been driving Elder to pick up typing via Typing.com, but I’ve been starting to wonder how much she’ll actually need it.

I’ve been watching the advent of smart speakers like the Amazon Echo and Google Home with some trepidation. It’s bad enough that I’ve got a personal tracking device in my pocket most of the time, the privacy ramifications of a smart speaker was always too much for me. I also would get a kick walking into someone’s house and ordering fifty-five gallons lube via their smart speaker, a la XKCD.

Perhaps it was warming to the ability to summon Siri without picking up my phone, or maybe it’s watching the kids rely on it, but we finally picked up an unused Echo Dot from family over the weekend and set it up on a shelf between the kitchen and living room. I’m sure the primary reason was that we’ve all been enjoying Amazon Music way lately, and I wanted a way to play it without having the TV on all the time. And if it keeps the girls off their screens, I’m fine with it.

I still haven’t reconciled the privacy implication for it. Knowing that Amazon workers might be listening to our recordings is lessened by the fact that this device is in our common room. There’s no way I would put one of these things in my bedroom, I don’t even keep my phone in the bedroom. (I’ll admit I keep the iPad next to the bed, for reading nonfiction.) There’s also the ethical quandary about Amazon’s business and labor practices, but it’s hard to avoid that; the best we can do is acknowledge it and keep it in mind when we spend our money there.

Roemmel claims to have used voice first for everything from writing to coding, but I just can’t quite see myself going voice first myself. My mouth seems to travel faster than my brain most of the time. I’m not sure whether you would call this a habit as much as a trait, but I prefer to write since it slows me down a bit and forces me to think things out a bit more. I’ve heard Neil Gaiman say similar things about writing with a quill pen.

I imagine that voice first is much better for entering a flow state. One of the tenants I try to adhere to when writing is to write fast, bad, and wrong, and fix the mistakes after the draft is finished. In practice though, I tend to stare at the words on the screen as they come up, and wind up correcting those red, squiggly underlines as they pop up. Experimenting with not looking at the screen when typing might help. So maybe if I could dictate without worrying whether the dictation is right might be a better way to do work.

I’ve been a musician for twenty five years, and sometimes while messing around with a new song that I’m wrritng, I’ll just start making up lyrics as aI go along. It’s like a rock-freestyle, if you will. I’ve written complete verses off the cuff this way, only to realize at the end of it that I had no recollection of what it is that I just said. This is more about my failure to keep a recording running when I’m playing, but it is an example of one time I’m sure that voice first is a bettter workflow for me.

Of course, if voice first is the most effective path to getting thoughts down, why stop there? One of my high school English teachers told me an old saw, that “good writing is pure thought put to page”. So how long before we can read the thoughts directly off the mind? I know some of the finest minds in science are working on it. Whether externally via an EEG, or via direct brain-computer interface, I’m sure the next decade or two will bring about more advances in sharing our thoughts with others.

I still like writing on a page, though.