Conviction Icon

Startup Ideas

One of our favorite parts of the jobs is when founders surprise us with ideas we've never thought about.

But, given the time we spend working on and thinking about AI product, we do have some thoughts for where we think important companies might lie and what challenges they'll need to solve. To be clear, we don't think you have to work on any of these ideas to get into Embed or to pitch us. We share them here as a way to show our thinking about where opportunity lies and how we evaluate ideas. We'll continue to update this list as we formalize more of our thinking.

Apply to Embed here!

Quick Links

Many, Many, Materials

A paper from DeepMind last year trained a graph network that predicted more stable structures than the history of materials science research had previously discovered*. Another paper from folks at Berkeley and the Lawrence Berkeley Labs built an “autonomous laboratory” that propose a novel compound, synthesize and characterize it, before beginning the cycle again with an updated proposed structure. Together, the papers demonstrate a really interesting capability in academic settings.

Novel material proposal and synthesis has obvious applications from reducing the use of toxic materials in battery synthesis to designing more energy efficient methods to produce already commonly used components. Historically, this capability has been bespoke and owned by manufacturing and mechanical engineering groups within large companies; this capability of material and process optimization is so novel yet powerful that it may create a new market serving the diverse set of use cases across industries.

We’re excited to meet founders with deep domain expertise and with insights on what sets of markets might be good early targets.

*Notably, followup papers have argued many of these structures are functionally equivalent to others already present in existing databases. Nonetheless, the result demonstrates interesting capabilities.

A New Age for Human Data

Demand for human data is clearly exploding, but the distribution of tasks is changing. As one human data lead put it at a large lab put it, five years ago, anyone with a fifth grade reading comprehension level was a useful labeler for models; today, frontier models would only hire authors and poets. From our discussions with large labs and startups working in verticals with domain-specific reasoning (e.g. financial data, scientific literature, etc.), there’s rapidly growing demand for a new class of high-skill labeling, characterized by complex tasks that can only be completed by a highly educated or skilled population (e.g. software developers, biology PhDs, etc.).

We think there’s an interesting class of problems for these startups to solve, from credentialing and skill assessment to rater motivation for otherwise highly skilled and well compensated professions. Convincing doctors to spend hours a day labeling scans requires not just upping their hourly rate, but also convincing them that they’re contributing to a broader mission and that the quality of their work matters. With human data budgets for the large labs entering the eight and even nine figure range, we’re confident there’s a large market that can support many winners & specialization to specific data types.

Web Data APIs

Models, however intelligent, still need access to live, reliable information. As much as the world’s knowledge can theoretically be encoded and made available in model weights, a huge amount of the inputs models need change in real time: current pricing for concert tickets, whether an item is available in store, recent news events about a topic. Access to citable content is both trust-building for users and reliably improves the correctness of model outputs as well.

However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but would also be extremely helpful for many of the personal assistant style applications we can think of.

We know building crawlers at scale is really damn hard — which is also why we don’t think every agent company around is likely to do it themselves. Historically, the only companies with anywhere near complete and live updated crawls have either been search engines (Google, Bing, Yandex) or search engine adjacent (Amazon, Facebook). We think there’s an interesting opportunity to bootstrap an index, starting with focused crawls within specific verticals and leveraging signal from customers.

Always Pick Up the Phone

Businesses (in particular small businesses) do not answer about half the calls they receive, but inbound calls are often their most important source of leads. Everyone has experienced this.

Use cases range from home services qualification to informational updates, from restaurant reservations to appointment-booking, from order tracking and stock checks to bill collection. These critical customer experiences are widespread, scoped and transactional. Voice generation quality and LLM capability are approaching the ability to handle many transactional calls. What’s missing is the last mile — distribution, customer journey design, guardrails and workflow automation.

Should this be developer infrastructure, horizontal SMB application, or a rethought full stack vertical solution? You tell us.

Autonomous HR (and IT) Helpdesk

A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.

Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.

The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.

A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.