One of our favorite parts of the jobs is when founders surprise us with ideas we've never thought about.
But, given the time we spend working on and thinking about AI product, we do have some thoughts for where we think important companies might lie and what challenges they'll need to solve. To be clear, we don't think you have to work on any of these ideas to get into Embed or to pitch us. We share them here as a way to show our thinking about where opportunity lies and how we evaluate ideas. We'll continue to update this list as we formalize more of our thinking.
Your mom receives a panicked, live call from you, from a spoofed number, asking for a wire transfer to post your bail. What does she do?
As voice and video recordings are increasingly available online or otherwise captureable by malicious actors, AI voice cloning, avatar and image generation technologies can be abused. Voice, video and images of government identity are used to authenticate users by banks, insurers and other financial institutions. We believe there are potential opportunities for both enterprise and consumer protection services. From a research perspective, like many security battles, this may be a continual game of cat-and-mouse rather than one won by silver bullet defenses that cannot be reverse-engineered. Provenance is a technically interesting solution, but an untenable one in the short term — it requires mass adoption. We need synthetic media detection.
For businesses, authentication is a key part of user experience, and the category winner here will need to execute not only on detection accuracy but also on ease of integration into existing identity orchestration, databases, risk and fraud systems. The media industry could also be an early adopter segment, and a societally important (if less economically important) buyer.
On the (pro)sumer front, we think there is demand for a mobile application that provides deepfake detection, Caller ID, spam blocking and perhaps even live translation or agentic experiences for call handling. Owning the call client could also enable some relatively trivial defenses (e.g. automated callbacks). While few consumer security companies have reached mass adoption, this feels like a critical new threat vector. Our friends have already experienced voice-based transaction fraud, and the problem will only intensify from here. Save your grandparents. Save yourselves.
There's an ongoing debate, given the impressive performance and realism of OpenAI's Sora, over whether foundation models given sufficient scale and data can self-learn core physics principles -- while philosophically interesting, it's clear that this approximation of of physical properties is still insufficient to solve a clearly valuable potential use case: manufacturing asset generation.
In order to be useful, a CAD assistant needs to generate 3D assets that are extremely precise, fit a number of real-world constrants (and optimized along some dimensions), and also have to be cost-effective & manufacturable. These are often processes that are still difficult for experts, much less AI models that still don't have a real understanding of "physics".
We think this is a problem worth tackling despite this challenge, for a couple reasons: first, we think it's possible to build assisstive tooling that either helps engineers narrow down their design space more efficiently or performs some set of menial tasks for them as a starting point; second, while Sora and older methods like NeRFs and Gaussian Splatting are clearly insufficient physics understanding, we're optimistic that combining generative models with post-processing work to "clean up", like simulation for validation and other post-processing, can reduce the burden on zero-shot model output. And finally, the market is enormous; the vast majority of the revenue of AutoDesk, one of the largest CAD players, comes from their engineering & construction division ($1.1 billion in 2021). Even minor improvements here, if integrated well into workflow, can be valuable.
Fintech onboarding teams face a unique tradeoff: customers are impatient and want to start using their account as soon as possible, but good dilligence just takes time. Compliance teams today are forced to do manual reviews of submitted information, financial investigation work to validate data, and prepare rigorous audit logs, all under a time crunch.
Foundation models, with access to some simple web browsing-like APIs can already emulate some of what a human compliance agent might do in their investigation, and, at minimum, can parallelize workstreams and surface potential red flags. Fraud teams have historically trained models to flag suspicious activity that rely heavily on static features like domain & email address. Next generation models enable them to "seek out" real-time data, synthesize findings, and make a (more accurate) decision in a fraction of the time.
For example, a common workflow for KYB teams: upon receipt of customer application, verify that the business hasn't had an interaction with sanctioned/restricted foreign governments. Models today are already capable of ingesting news sources & LinkedIn profiles, looking for evidence of a relationship, and flagging potential risks. Another common workflow to investigate potential customers is to analyze company websites, given non-functional websites can be markers for fraud.
Healthcare is a mess of unstructured data: conversations, clinical notes, coverage policies. Click fatigue and physician burnout from operating today’s charting products are becoming untenable.
Breakthroughs in transcription, speaker diarization, translation, summarization, LLM web navigation/form filling, and general retrieval and reasoning will revolutionize the patient encounter and claims / prior auth workflows. Healthcare is full of transactional, yet manual human encounters. Benefits verification, patient reminders, and claim appeals are still handled by manual human interactions today. AI offers healthcare, with its brittle processes and aging systems of record, a chance to leapfrog forward.
While clinical applications require a rigorous approach to safety, rapid advances against "human benchmarks" e.g. medical licensing exams, suggest the problem is shifting from science to engineering. We believe these applications, from the administrative to the clinical, are likely to benefit from domain-specific fine-tuning.
Small businesses today list their products across a variety of different platforms (e.g. Amazon, eBay, Shopify, etc). There exist a host of consultancies and digital asset managers today that help them manage their digital presence across all of these channels, some of which attempt to help SMBs understand their customers, optimize their advertising spend, market and price their products more efficiently. There's a lot of potential improvement left on the table in the form of product photography, descriptions, richer metadata generation, or more innovative ways to present products.
Foundation models enable the at-scale generation of alternative product listings and also therefore A/B testing of variants to improve conversion and potentially optimize for different audiences. With some improvement in technology, we believe it will also be possible to generate videos and other assets that convert better than anything manually generated today. By crawling marketplaces and experimenting with listings, a selling assistant might even be able to provide recommendations for new products to build or untapped audiences.
One of the emergent use cases with the initial wave of LLMs has been a mix of therapy and personal coaching. What should you say to the friend who just lost a loved one? How do you have a hard conversation with your boss? How should you approach that person you're interested in?
General-purpose LLMs do a passable job at this today, but aren't perfect. Security and privacy are key - you probably want to have a WhatsApp-like commitment to privacy with disappearing messages and expiring history, but you also want the product to have high-level recollection of prior conversations. Safety is even more important for sensitive conversations. And from a brand and positioning perspective, you'd also have to normalize getting advice from a machine vs. a trained human.
A generally useful strategy for building new products is to observe emergent behaviors in horizontal technologies (like chatbots) and then build a specialized experience for those behaviors - this makes us optimistic that there is space for a dedicated solution. The space also likely supports a mix of B2C and B2B2C business models (e.g., employer-provided therapy or coaching services).
If you've ever had the honor (read: misfortune) of being placed on on-call rotation, you've experienced the dreaded 2 am PagerDuty alert. Still groggy, you pulled up a dashboard with a bunch of failing services, poorly written logs, and angry messages from your manager. If you were lucky, you had a pretty good idea for what the issue was, with a previously written set of steps to resolve, you could be back to bed within the hour. Unlucky and you'd spend hours debugging in the middle of the night.
We think there's an opportunity for automated root cause analysis to substantially improve incident resolution time, and the experience for engineers. A lightweight "agent" with access to logs and metrics can, to start, retrieve relevant information (e.g. service statuses, past error logs, similar prior incidents) and suggest fixes based on what previous resolutions were. After an incident, the same agent could be used to provide a "best practice" resolution for future incidents. and generate a post mortem. Long term, agents might even be able to automatically fix common reoccurring issues.
While agents, broadly, don't (yet) work well, we think this area might be easier to tackle. For one, existing runbooks, which explain steps to resolve common issues, make bootstrapping much easier. More broadly, a lot of debugging is hypothesizing what might have gone wrong and looking for evidence. A basic information-only agent can already help substantially by testing hypotheses; by the time you roll out of bed, your debugging assistant can already tell you "it's not DNS," "all AWS AZs look good," and one day even "it's not a cascading cache invalidation."
We were early investors in Harvey, in part because it was clear that despite legal firms historically being minor buyers of technology, LLMs would allow massive automation of such a text-based industry. Continuing down one path from that thesis, we think there's potential to automate certain end-to-end, transactional services provided by law firms.
For example, we think it's possible to build an "Immigration Firm in a Box" -- a system of models (supported by some human help to start out) that ingests your employment & personal data, files your immigration claim, advises you on possible visa options, and can answer questions about status and progress. Clippy for Immigration might even be able to provide more available (and cheaper) opinions than current systems.
Another example might be basic trademark search and filing, or preliminary sales contract markup based on precedent - always frustratingly slow.
We're sure there's lots of variants of this pattern that we're not familar with yet, but we think this shape of company is really exciting.
There are a bunch of queries that are broken in the current search paradigm. If you want to know what washer/dryer combo to buy or what to do in New York City for three days with kids, you either spend hours reading through dozens of tabs littered with ads and terrible UIs or you just end up on a trusted site like Wirecutter (or worse, you do both).
LLMs are really good at quickly reading and synthesizing hundreds of pages of content on any topic. Is there a good consumer product experience to be built around building a dynamic, interactive, comprehensive query experience for high-intent, high-consideration purchases?
The first challenge will be building a product experience that is better-enough that an existing search experiences that it develops a cult following. The second will be figuring out distribution. But if you can solve both, this is a very valuable and lucrative problem to solve.
Recruiting and allocating hourly workers across restaurants, retail, field services, and warehousing today often involves a manager posting paper flyers and texting team members about changes, playing human tetris to fill a shift. Recruiting, employment compliance/admin and logistics requires a backend database and scalable workflows, but the next generation of workforce software shouldn’t put that burden on managers or workers. The preferred interface for the field isn’t a clunky mobile workflow app, it’s natural language chat over SMS, and we now have models good enough to engage/screen applicants and (help) beat the game of tetris.
Why isn't there a Copilot-like experience for the rest of your computing experience yet? A browser extension that learns your writing style and makes you 10x faster at email and anything else you have to author. The ideal experience would be deeply personalized based on everything you've ever written and could author an entire email with just a couple of words of context.
While it might seem that incumbents have an unassailable distribution advantage here, Grammarly has shown you can build a large, independent business here. Incumbents are also likely to be slow and cautious in launching this, creating space for new entrants.
Speed and privacy will be key, likely necessitating a hybrid local/cloud approach.
The physical security monitoring industry remains stuck in the past. Organizations and consumers deploy millions of cameras, but the last generation of companies is still moving storage to the cloud and creating seamless networking gateways (a huge improvement), and penetration of sophisticated computer vision remains minimal.
Hardware and storage should be rethought from the ground up in the age of semantic video understanding, and powerful on-device models. A full-stack security services firm could see more, cost less, and offer a step-function better experience.
Businesses (in particular small businesses) do not answer about half the calls they receive, but inbound calls are often their most important source of leads. Everyone has experienced this.
Use cases range from home services qualification to informational updates, from restaurant reservations to appointment-booking, from order tracking and stock checks to bill collection. These critical customer experiences are widespread, scoped and transactional. Voice generation quality and LLM capability are approaching the ability to handle many transactional calls. What’s missing is the last mile — distribution, customer journey design, guardrails and workflow automation.
Code generation might be the most obvious area for language models to make a large impact. Beyond being an in-domain problem for AI practitioners, and an obviously valuable mostly-text format, code models also benefit from the rigid structure of code as a language and the ability to leverage compilation & testing checks as a mechanism to provide feedback to models. The work of developers is so valuable (and expensive) that automating or accelerating even small portions of it is incredibly valuable as well.
Empirically, this has been partially true; one of the first AI products to get real traction was Github CoPilot, and still is among the most successful today with over a million developers using it. More recently, ChatGPT has proven to be a useful assistant for writing and editing code. But the list of products with widespread usage roughly stops there; from surveys of engineers in our network, we haven't found any other code development products that have gotten widespread adoption.
A gap that we see in the market that's particularly exciting is the ability to go from a human description of an issue to a draft solution, in code, to the problem. We've, of course, seen some exciting open source projects like AutoPR and GPT Engineer working on this problem, but we believe that there exist some deeper technical challenges that have need to be tackled in order to solve this well. Some examples below:
We're excited to meet folks who have insights on how to solve these problems well (or believe you don’t need to in order to generate high quality code)!
Over the last five years, an increasingly large slice of security solutions have “shifted left,” born out of a realization that placing security checks at the end of the software development lifecycle results in waste and a larger communication burden. As part of that, tooling that integrates into integration & deployment processes or, ideally, software development itself, has proven to be extremely valuable. Static analysis tools, that automatically look for vulnerabilities and potentially fix them, have been a large part of that.
While extremely useful, the major issue with static tooling so far has been the high rate of false positives. While machines can often flag potential issues, it requires context like code structure, deployment status, and even historical application traffic to determine whether a potential vulnerability is harmless or urgent. In practice, static tooling sometimes has such high positive rates that engineers tend to ignore them entirely.
At larger tech companies (i.e. Google, Microsoft), we've heard of internal tooling that automatically triages and prioritizes issues identified by other systems. We think that language models may generalize well enough to bring this technology to smaller organizations as well. We also believe there are interesting related opportunities, such as building automatic remediation of identified issues and cloud resource provisioning as a result of the independent trend towards infrastructure as code.
LLMs can now plan against objectives (poorly) and carry on an engaging conversation - even be Sensible, Specific, Interest and Factiual (SSIF). What would a game world populated by AI's be like? If “The Sims” and the engagement with AI girlfriends, AI celebrities, and AI therapists are any indication, it would be wildly fun.
What if the next generation of entertainment is personalized generations? If one like to look at pictures of “cats where they shouldn't be,” let's generate them. In an era where one can increasingly produce any media (images, audio, video, memes), mass personalization feels within reach.
Video is a major social, informational, educational, and marketing medium, and the fastest growing. Digital video ad spend is projected to rise 17% in 2023 to $55 billion (per IAB). However, production of “commercial” video remains prohibitively difficult and expensive. Short form, simple commercial video can cost $1,000 to $50,000+ to produce from start to finish, and the majority of commercial video is created by agencies and professionals.
Demand dramatically outstrips “supply” of video production. Only ~3,000 brand advertisers globally create video ads, but there are 250M video creation and editing web searches per year in English.
AI will revolutionize and democratize video production, editing, personalization and understanding. Video is a challenging frontier of AI research; it is computationally costly, there's limited input data, we are still figuring out how to ensure temporal consistency, and it deserves new interfaces for control. But the frontier is advancing rapidly, and we're interested in companies that both push that frontier and cleverly leverage these technology in usable products today: from indexing/semantic understanding, to captioning and translation, to style transfer, to generated backgrounds, avatar and even product videos from 3D models, there's a treasure trove of technical capability. The product opportunity (to cleverly cross the usefulness chasm with the capabilities we already have) is equally important.
Language models benefit a great deal from access to "reliable web data" -- knowledge bases offer explicit checks against hallcuination, especially when combined with some research driven methods of revising (e.g. Gao et al 2022, Peng et al 2023). They also allow for citations to externally verifiable material, which are valuable both to build user trust and also to expand on first answers with reliable source material.
However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but, at least at first glance, would also be extremely helpful for many of the personal assistant style applications we can think of.
Another variant of this problem is in systematic crawl and parsing -- today, companies sign one of contracts to crawl & parse a pre-negotiated set of fields through third party providers. A modern crawl company could offer those sets of data, along with the orchestration to ask arbitrary questions of that dataset (i.e. "on each page that discusses AirPods, what's the sentiment?"). We think this power will be useful not just in e-commerce, but in a wide variety of other use cases, like pharmaceutical companies looking to gather data on side-effect frequency or market research firms assessing the success of a new product launch.
LLMs have the potential to transform financial and accounting software from databases to context-aware, proactive processors. These models could shift the human expert's role from manual “rules engine” to strategic oversight.
The initial success of domain-specific models such as BloombergGPT on financial NLP tasks (such as ConvFinQA), the “code interpreter” approach to increasing accuracy of calculations, as well as early research results of using specialized LLMs for tasks such as transaction classification are all encouraging.
We think this is a technically rich and commercially valuable application area: requiring robust interactions with PDFs and tabular data, increased domain-specific reasoning, task-specific research and engineering, and definite need for workflow product beyond the chatbox. From a data perspective, we’re particularly excited that global accounting, tax, financial reporting and compliance standards are all codified in natural language, with corresponding large crawl-able datasets of compliant examples. Some tasks that could be interesting starting points:
A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.
Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.
The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.
A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.
There are many (promising!) startups working on solving customer support problems, beginning with a common set of simpler use cases, like processing returns on e-commerce sites or basic questions about planning on travel sites. We think that this is a large and promising market, but also believe there is a unique and new opportunity to target a more challenging & sophisticated set of "technical customer service" requirements i.e. issues with MongoDB, Databricks, Github, etc.
These issues are currently extremely expensive for companies to deal with, often requiring staffing (multiple!) full time engineers to support or "forward-deployed" roles. And, existing customer support solutions are unlikely to support this workflow; in order to solve for the technical support use case well, we think a startup would likely have to do multiple of the following:
An early version of this product might serve as a "debugging copilot" for the engineers currently working in that role, and, over time, enable them to spend more of their time actively building and deploying new product as opposed to purely on customer support. We also think it's possible targetting the top end of this market would lead a startup to build rigorous infra and eval that would enable them to serve the more traditional (i.e. less technical) use cases as well.
We're excited to talk to folks working both on the technical and more general variants of this problem!