The idea that you could type what you want to happen into a magic input box and it would just happen is an attractive one.
UX Today
Unless a user is specifically “searching” for something (such as in a catalog), they will almost always look for the things they can click, in order to perform the actions they want.
In the software development world, this is less universal. Instead of all applications being graphical user interfaces, many are terminal-based, and driven entirely by using the keyboard. It might be in the form of keyboard shortcuts (e.g. emacs and vi) or text following a “command” (i.e. “arguments”).
In some graphical tools, such as IDEs, there might just be a magic input box that allows searching all text, settings, plugins, and all actions the user can perform. A user can declare their intent, and the (single) action is performed.
Command Palettes
The most used command-palette might be the macOS Spotlight, though it’s a bit of a limited experience. Many tools are beginning to integrate these input boxes, like Slack and Notion - most even use the same shortcut: Ctrl/Cmd + K
.
More recently there have been companies that encourage you to leverage a “command palette” as a central part of the experience, such as Superhuman, Linear, Vimcal. The result is a UX where every action in the application is discoverable in a search-field experience, often customized to what the user is currently doing or looking at.
And this is spreading.
There are now products that are being built and released where the command palette is the user interface, such as Raycast and Lazy, and they are certainly trending.
Not only do they shift how a user uses applications to that of declaring what should happen, but they embrace the idea of extensibility, where developers can improve and extend the possibilities of the base experience - just like IDEs, Figma, Notion, Slack, and more. The company building the app doesn’t need to build everything themselves, they just need to build something extensible, and foster a community of developers.
Type a few words, and what you want to happen, happens. Without having to open and interact with the application that actually does the heavy lifting… which sounds a lot like another recent trend - Large Language Models (LLMs).
LLMs (and Pitfalls)
A bunch of smart people have been working hard to research and develop machine learning models that can “understand text”. One of the breakthrough approaches found is called the "Transformer”.
These same ideas were used to create GPT-3, an amazing model that is being used to do all kinds of incredible things, and power a number of startups. It’s a language model that has embedded in it, a massive amount of information. This means it can coherently answer questions, solve problems, write stories, and more.
DeepMind had the idea to separate the language model from the information, while maintaining the functionality with RETRO, which is a fantastic direction, as training these models can cost seven-figures (training GPT-3 supposedly cost almost $5M), and any changes to the embedded information would require retraining or fine-tuning (i.e. additional cost and risk of performance degradation). If the data could be updated at any time, and the model just understands how to find and present the information, that seems like a huge step forward.
Another wildly popular use of LLMs is in Stable Diffusion (how it works) and other Generative Image models such as DALL·E, which are what generate all of those incredible images you’ve no doubt seen around the web which seem like the kind of thing a fantastic artist would create.
In all of the above cases, “getting what you want” isn’t as straight forward as you might think. The difference between a jumbled mess and a work of art, or a nonsensical answer to a well-reasoned, well-explained answer, might be the difference of a few words in the “prompt” - the text the user gives to the model. This has created a new area of technical experience called “Prompt Engineering” (e.g. a book to help people write better prompts for Stable Diffusion, and a blog post on writing better prompts for GPT-3).
So, it begs the question, is natural language going to be a more effective input medium than a fuzzy-search, a grammar or a domain-specific language (example) historically used for text-based input mechanisms?
As it often does, it may come down to abstraction. If an application can abstract away from the end-user, the voodoo and magic phrases the model needs in its prompt to perform well, in addition to providing the freedom to make updates to the underlying data that powers their interactions, then there just might be something there.
In order for useful things to happen when a user types stuff into a magical input box, though, it needs to understand how to perform a series of actions on the user’s behalf to achieve the desired goal.
Task Automation
Fortunately, performing a series of actions to achieve a desired goal is a well-explored realm, and continues to be a hot space.
There are startups like Magical that help individuals with automating (simplifying) tasks they perform regularly, like data entry, like Zapier for connecting a ton of different services together and building workflows to automate manual work, and tools like Mimica which are focused on task and process mining - much more of a focus on large enterprise tasks and processes - to automatically create automated workflows by watching and learning from users performing the tasks to be automated.
And, as you might guess, there are companies working on Task-Oriented LLMs, like Adept, where you can describe a goal / action and it will perform it.
This is starting to look a lot like a “Declarative UX”.
A Standardized Interface
Zapier provides, for many cases, the automation of complex interactions between applications. This might involve triggering an action in one application when a certain action takes place in another, or transferring key information from one application’s format to another.
For inter-application operations to become more widespread and accessible, there will likely need to be a trend of building applications with an expectation that input may be provided from an external source, and any action can be subscribed to, or data be exported (if allowed by the user). This sounds eerily similar to an API you might see for a SaaS product, but many types of software act more like a walled garden.
When it becomes natural for many applications to be involved in complex processes requested by an end-user, it seems reasonable that there will be a movement towards applications becoming a suite of atomic behaviors.
Trust and Transparency
A quirk of the LLM experience is that every query is just like typing a search into Google and clicking “I’m Feeling Lucky”. This definitely doesn’t yield an experience that you can trust as an end-user.
A good declarative UX should either yield an exact action with a clear definition and result, or for more complex or nebulous actions, provide a clear summary of what is about to be performed and/or stop and prompt the user at key decision moments, such as making a purchase.
UX Tomorrow
Over the years, many companies have tried to build a product that allowed the use of natural language to achieve some task, and the experiences seemed to have fallen flat.
Many people have some form of smart speaker in their house which is able to do a pretty impressive job of converting speech to text, but incredibly disappointing when it comes to doing more than the equivalent of a google search and other incredibly basic actions. Though, Amazon did manage to make buying something easy with their speaker, they don’t work anything like J.A.R.V.I.S. All-voice interfaces draw many parallels to typing text into a magic input box, and fall solidly within the “Declarative UX” bucket. At this point, many of the same problems need to be solved.
With the rise of consumer-accessible automation, large language models, and command palettes, user experiences will likely lean more and more heavily on performing key functionality by typing text into a prompt.