Iron Man's AI Assistant Might Just Be the Future of Work

Many have rightfully praised the movie Her for its foresight in predicting the emotional and societal impact of artificial intelligence. But when it comes to AI in the workplace, it’s Iron Man’s J.A.R.V.I.S.- the AI that seamlessly manages Tony Stark’s schedule, home, lab, and Iron Man suit – that feels more aligned with what we’re seeing emerge today. It’s less about companionship and more about an intelligent, always-on co-pilot for getting things done.

Within the startup ecosystem, there are glimpses of what the first versions of J.A.R.V.I.S. might look like. A few companies have started building personal AI assistants or AI-augmented virtual assistants that can operate with high fidelity. We believe this is the start of building highly personalized AI assistants that can predict and complete tasks on behalf of individuals. The world’s best executive assistant, available to everyone. A few key trends are buttressing our excitement in the space:

Voice AI has advanced to the point where you can have low-latency, highly realistic conversations. This is crucial as personal assistants will need to live in each communication channel – Slack, voice, email etc. – to be truly ubiquitous and helpful. Additionally, they allow applications to collect and reference data that was previously not available to them.
We have reached an inflection point with the Model Context Protocol (MCP), which will dramatically lower the barriers to accessing services that AI personal assistants will need. This will allow AI assistant developers to focus their resources on fine-tuning their task control and communication models, rather than assigning resources to build connectors to services.
Longer context windows (Gemini has a record with 1 million tokens) allow AI assistants to have more context when making decisions.

We have identified two distinct ways in which companies are building towards this J.A.R.V.I.S. future:

Task-Executing Assistants: There is a group of startups that focus on using admin assistance as an initial wedge. One way is to sell highly priced remote assistants—real humans equipped with AI tooling to complete a larger number of tasks on behalf of their clients. Others are building AI-first assistants that initially focus on scheduling, with no human-in-the-loop involvement. Eventually, both of these strategies will converge, and companies will offer full AI assistants that can preempt and execute tasks for customers.
Context-Aware Intelligence Systems: Some companies are exploring voice-based systems as a new interface to capture richer context from conversations. Rather than focusing solely on transcription or summarization, the ambition is to enable more intelligent and ambient interactions over time. As voice becomes a more natural mode of input, these systems could support more intelligent workflows by helping users with greater awareness when executing tasks. The space remains early, but we see long-term potential as voice data becomes more structured and actionable.

We aren’t yet sure which path is better, but we’re excited about both. A few of our key questions as we continue to dive deeper into this opportunity are:

Where is their greater value? In context-aware systems? Or is it in executing actions?
Can both of the product paths we outlined co-exist? If there can only be one type of assistant, which starting point do we believe will be stickier and earn the right to occupy the other’s space?
What are the nuances in how people want to interact with these AI assistants? Will it be through voice? Slack? Email? GUIs? All of the above?

Task-Executing Assistants

With LLMs being able to reasonably mimic the diction of an average person and task-executing agents seemingly right on the horizon, AI assistants are gaining popularity. As more of our internal communications and corporate data are exposed via streamlined APIs, these assistants can be prompted to access key data to perform (at first) basic tasks on behalf of users.

There are two main strategies we have seen to date. First, there are AI-native assistants that start with basic tasks, such as scheduling meetings. These products are largely activated via email prompts and are, for the time being, more limited in scope in what they can do for users. Second, there are virtual assistants who leverage AI tools to perform a greater range of tasks more quickly. While both strategies start from different places, we believe they share similar goals: to gather idiosyncratic data on users’ preferences to ultimately offer them a highly tailored AI assistant that can act as an extension of the user themselves.

The two sets of strategies also target different demographics. People who are accustomed to interacting with assistants and want white-glove service might opt for the “safer” human-in-the-loop option (Athena charges thousands of dollars a month). Those who have historically not had access to an assistant or are more price-sensitive might opt for the fully automated offerings (Skej charges $10 a month).

Ultimately, we think that these businesses can be powerful conduits for the maturation of agent technology. Tool connectivity will shift the power towards companies who directly own the customer relationship, something that stands to benefit Task-Executing Assistants. As consumers become accustomed to AI assistants completing tasks that are feasible with today’s architecture, these assistants can eventually “earn the right” to handle more extensive tasks in the future, such as booking travel, ordering gifts, or other ancillary tasks that people may want to offload.

What’s Interesting About This Direction

It taps into existing workflows: People are used to interacting with email and chat. Using those as mediums to generate tasks is a straightforward method for deploying this product.
They are perfect conduits for continued agentic advancement: What consumers seek are good Swiss Army Knives. As AI models and tools mature and become interchangeable, the execution of individual tasks will also improve. But to usher this in, they seek a flexible device that can determine which model or tool performs each task best. The corkscrew or small knife within the Swiss Army Knife might change, but they’ll keep that orchestration layer itself.
Text as the initial medium of communication should limit the surface area of potential mistakes: By using text as the way these AI assistants are interacted with, these companies can control the instructions fed into the “goal” agents in a way that should limit hallucinations and mistakes.

Open Questions

What are the killer use cases outside of scheduling that these companies can grow on top of?
What sort of leeway will customers give these assistants if they make mistakes? How quickly can the companies iterate to ensure people don’t churn?
How will the AI assistant companies monetize task completion?

Context-Aware Systems

Advancements in transcription and voice AI have given rise to a new class of tools that process audio data from meetings, calls, and conversations. These tools primarily focus on summarization and transcription, making spoken content more accessible and searchable.

Companies like Gladia are developing infrastructure to support these capabilities, offering real-time, multilingual transcription services for various applications. Similarly, Hume AI is exploring the integration of emotional intelligence into voice interfaces, aiming to make interactions with AI more natural and personalized.

In the realm of verticalized solutions, Winn.AI provides a sales-focused assistant that automates CRM updates and offers real-time guidance during meetings. On the other hand, Amie emphasizes design and user experience, integrating calendars and to-do lists into a seamless interface.

While these developments are promising, it’s important to note that the field is still in its early stages. The majority of tools have not yet moved beyond enhanced (and pretty awesome) summarization and transcription. The potential for richer, context-aware systems exists, but realizing this vision will require further innovation and user adoption.

What’s Interesting About This Direction

Underutilized Audio Data: Historically, conversations have been transient, with valuable insights lost after the fact. Capturing and processing this data can unlock new opportunities for understanding and collaboration.
Improved User Interfaces: Recent tools offer more intuitive and user-friendly interfaces, making it easier for users to interact with and benefit from audio data.
Emerging Infrastructure: The development of robust infrastructure, like Gladia’s transcription services and Hume AI’s emotional intelligence capabilities, lays the groundwork for more advanced applications in the future.

Open Questions

What are the compelling use cases that will drive adoption beyond transcription and summarization?
How will users respond to systems that attempt to interpret and respond to emotional cues?
What are the privacy and ethical considerations in capturing and analyzing conversational data?
How can these tools integrate seamlessly into existing workflows without causing disruption?

We’re incredibly excited about what the future of AI-enabled assistants has in store. The products that exist today are the worst they’ll ever be, and yet are already incredibly powerful. If you are building anything that intersects with the theses we outlined above, please reach out! Molly, Naseem

‍

Share this post

Task-Executing Assistants

What’s Interesting About This Direction

Open Questions

Context-Aware Systems

What’s Interesting About This Direction

Open Questions

Read more

Forward Deployed Engineers: The Role Every AI Company Wants But Not Everyone Needs

Operator Group Chat: If AI can code, what makes engineers stand out in 2026?

Northzone uncapped: Our 2025 in data