Note: The irony of physically typing and reading an article about the future of voice and conversational computing is not lost on us. If you want to listen to this article, click the play button below. If you want to have a conversation about the contents of this article, you can do so through this Eleven Labs conversational agent, trained on Tom's voice and POV.

Imagine a world where, all day long, we are having conversations with computers. In the car, we could dictate emails, queue up a playlist, or get directions without ever taking hands off the wheel. No more fumbling with cords, menus, or Bluetooth connections. On our phones or laptops, we could go fingers- and hands-free, speaking a series of instructions naturally — as if to a friend — to purchase tickets, update a spreadsheet, or send files to colleagues. No more mice, trackpads, or tiny phone keypads. We will also be able to have those conversations completely silently, thanks to harnessing brain-machine interface or silent speech technologies.
On some level, the future of conversational computing is already here. Voice assistant tech like Amazon’s Alexa, Apple’s Siri, and the Google Assistant let us control music, search the web, dictate messages, or adjust the lights with our voice. Some cars already offer built-in navigation and entertainment systems that respond to spoken commands.
But these are early, often clunky, and siloed experiences. In the age of generative AI, voice is poised to become a primary interface for many daily computing tasks, with screens and keyboards taking a backseat. Ultra-intuitive and frictionless, conversational computing represents a leap forward in human-computer interaction, unlocking possibilities far beyond today’s voice assistants.
An invisible user interface
Every major wave of computing has been driven by a breakthrough user interface that made powerful technology accessible to non-technical users. In the ’80s, the Macintosh introduced the graphical user interface (GUI), allowing people to see and click through programs rather than interact with lines of code. A few decades later, smartphones brought touch-based interactions, freeing us from clunky keyboards and putting powerful computers in our pockets.
Now, generative AI is making conversation the new user interface. Talking to technology requires zero training and no special skills; we have after all spent most of our lives perfecting the approach. It’s as natural as speaking to another person. As investor Naval Ravikant has put it, “The promise of AI is no UI.” You simply express what you need, and the AI does the rest. Instead of learning how to use software, software will learn how to understand us.
Why now?
Talking to computers isn’t new. Customer service hotlines and early dictation software have been around for decades. But accuracy was poor (“I’m sorry I didn’t get that”) and experiences were frustrating.
The shift is happening now because breakthroughs in large language models (LLMs) have pushed speech recognition accuracy to near-human levels, and because generative AI enables those systems to produce human-like, context-aware responses in almost any language instantly. This means computers can not only understand what you say, but also respond naturally, interpret tone, and carry out multi-step tasks. Soon, what now takes several taps or clicks could be done with a single sentence.
The demand is already clear. WhatsApp users send more than 7 billion voice messages every day, and nearly half of young adults use voice notes weekly. The software industry is racing to add conversational layers to everything, from shopping apps to productivity tools.
The other accelerant is hardware: Microphones are everywhere. Every smartphone, laptop, smart speaker, earbud, and modern car can capture your voice. Smart TVs, thermostats, doorbells, security cameras, and even refrigerators are now potential voice interfaces. Your voice can become a universal remote for your entire digital environment. And any device can become a conversational partner.
Why voice is the ultimate interface
Each new interface in computing expanded what we could do. Voice will supercharge those possibilities because it’s the most human, most instinctive form of communication. For starters, it’s fast. We can speak three to four times faster than we type, closing the gap between thought and action. It’s no surprise that people often prefer leaving voice messages or voice notes when they have a lot to say, rather than laboriously tapping it out. As one tech CEO quipped, “We’ve been waiting a long time for our thumbs to catch up with our thoughts.”
Voice is also a rich mode of communication. It carries tone, emphasis, and emotion — information that’s harder to convey through text. In healthcare, for instance, patients could triage symptoms by speaking to an AI doctor on the phone, with AI able to assess levels of concern and even pain through someone’s voice.
Finally, conversational computing offers new levels of accessibility. It works for nearly everyone, from older adults who might struggle with technology and people who have motor skill limitations to young children still learning to read and write. Imagine toddlers indulging their insatiable curiosity with AI tutors, peppering them with questions like, “Why is it raining?” and “How does the toilet work?” AI can also adapt to people with speech impediments, learning their distinct manner of communicating.
The next frontier: Silent speech and brain-computer interfaces
Future advancements are even likely to address accessibility limits for those without any capacity for speech. Innovative projects are exploring “pre-vocal” or silent speech interfaces — ways to capture what someone intends to say without actually speaking out loud. For instance, one of our portfolio companies in Israel is developing a wearable that lets people mouth words silently and converts the neuromuscular signals into text, enabling fully private voice commands. This would allow people to “speak” to their device during a meeting, for instance, or in public without anyone else hearing.
Going a step further, brain-computer interfaces (BCIs) are aiming for thought itself as the input. For example, Neuralink, another of our portfolio companies, is working on implants that translate neural activity directly into text or speech. The company recently demonstrated a paralyzed woman communicating at 90 words per minute — nearly half the speed of natural conversation — just by thinking about speaking. These “mind-to-speech” advancements, which represent a huge leap from earlier systems that took many seconds per word, hint at a future where the bottleneck between our ideas and their expression is virtually eliminated.
Collapsing the gap between intention and action
Talking (and eventually thinking) won’t be the only ways we interact with computers in the future, but they’re poised to become the dominant ones. Conversational computing could unleash a new era of self-expression and creativity. Consider the possibilities of composing music, crafting stories, or designing visuals simply by brainstorming aloud with AI. In software development, our portfolio company Bolt.new allows anyone to build apps by describing features in plain language, representing a radical democratization of software creation.
The way we interact with technology is on the brink of its biggest transformation yet. Natural language and voice — and eventually silent speech or thought itself — won’t just change how we use computers. It will collapse the distance between intention and action, transforming how we create, connect, learn, and work together.