Smart assistants as the new interface to computers

2024-05-17

Looking back, the first paradigm to interact with a computer was the command-line prompt. Given how constrained these machines were in terms of memory capacity and processing speed, it “naturally” emerged from being the simplest possible -- in the sense of ease of implementation -- way to provide an interactive interface to the computer.

That simplicity didn't come for free: The end-user was then encumbered with learning the shell syntax and commands. In addition to that, it was also necessary to learn a somewhat sophisticated mental model around “alien” concepts such as files, directories, programs: the concepts in which the computer itself operated.

After that, the more approachable interface was graphical: programs now were able to display their capabilities visually, leaning themselves to exploratory learning and tighter feedback loops.

After a few years, the indirection of the mouse as a pointing device was challenged by the touchscreen (which even toddlers are capable of operating). There are now orders of magnitude more touchscreen-operated than mouse-operated computing devices in active use.

The trends are clear: The computing device has grown in sophistication over the decades, fueled by advances in hardware and software. This increased complexity in the machine has allowed a greater simplicity for the end-user in regard to the ease of use.

This advance is generally afforded by removing indirections (think mouse vs touchscreen) and by simplifying non-essential concepts (think about autosaving vs remembering to save, and maybe the nonexistence of whole concept of a “digital file” in modern applications and services).

Now, we're standing on the cusp of another major shift: it has been demonstrated that there's automated technology capable of “translating” long-winded, imprecise natural human speech into software-backed capabilities.

What can be more natural than talking to the computer and having it understand and do what you mean?

On the simpler side, these can be operations that you done yourself on your computer: “clear the thrash bin”, “switch to dark theme”, “open Firefox”.

But people don't want to use a computer to clear its trash, switch its visual theme or open applications: people want to be solving their real-life problems, which typically reside outside the computer.

With this in mind, on the more interesting ramification is the possibility of goal-directed multistep agents, which perform the reasonable steps to fulfill some higher-level goal, such as: “make a reservation to such and such restaurant next Saturday”, “give me the summary of today's news”, “make an order for groceries according to my last month's purchase”, “keep an eye on air ticket prices to such and such place between such and such days”.