One of our favorite parts of the jobs is when founders surprise us with ideas we've never thought about.
But, given the time we spend working on and thinking about AI product, we do have some thoughts for where we think important companies might lie and what challenges they'll need to solve. To be clear, we don't think you have to work on any of these ideas to get into Embed or to pitch us. We share them here as a way to show our thinking about where opportunity lies and how we evaluate ideas. We'll continue to update this list as we formalize more of our thinking.
Companies are increasingly adopting vector search solutions (i.e. Pinecone, Chroma, Vectara, etc.), each with different claims to optimized ANN implementations, but comparatively little work has been done to optimize the embedding space itself. We believe that there remains substantial quality lift from constructing better embeddings, both by reducing number of dimensions required (and therefore cost & latency) and also by improving performance & controllability. We can think of a couple avenues that are under-explored here
We think this space is still very young, with lots of potential still, and would love to meet folks working on this problem or related ones.
Code generation might be the most obvious area for language models to make a large impact. Beyond being an in-domain problem for AI practitioners, and an obviously valuable mostly-text format, code models also benefit from the rigid structure of code as a language and the ability to leverage compilation & testing checks as a mechanism to provide feedback to models. The work of developers is so valuable (and expensive) that automating or accelerating even small portions of it is incredibly valuable as well.
Empirically, this has been partially true; one of the first AI products to get real traction was Github CoPilot, and still is among the most successful today with over a million developers using it. More recently, ChatGPT has proven to be a useful assistant for writing and editing code. But the list of products with widespread usage roughly stops there; from surveys of engineers in our network, we haven't found any other code development products that have gotten widespread adoption.
A gap that we see in the market that's particularly exciting is the ability to go from a human description of an issue to a draft solution, in code, to the problem. We've, of course, seen some exciting open source projects like AutoPR and GPT Engineer working on this problem, but we believe that there exist some deeper technical challenges that have need to be tackled in order to solve this well. Some examples below:
We’re excited to meet folks who have insights on how to solve these problems well (or believe you don’t need to in order to generate high quality code)!
Over the last five years, an increasingly large slice of security solutions have “shifted left,” born out of a realization that placing security checks at the end of the software development lifecycle results in waste and a larger communication burden. As part of that, tooling that integrates into integration & deployment processes or, ideally, software development itself, has proven to be extremely valuable. Static analysis tools, that automatically look for vulnerabilities and potentially fix them, have been a large part of that.
While extremely useful, the major issue with static tooling so far has been the high rate of false positives. While machines can often flag potential issues, it requires context like code structure, deployment status, and even historical application traffic to determine whether a potential vulnerability is harmless or urgent. In practice, static tooling sometimes has such high positive rates that engineers tend to ignore them entirely.
At larger tech companies (i.e. Google, Microsoft), we've heard of internal tooling that automatically triages and prioritizes issues identified by other systems. We think that language models may generalize well enough to bring this technology to smaller organizations as well. We also believe there are interesting related opportunities, such as building automatic remediation of identified issues and cloud resource provisioning as a result of the independent trend towards infrastructure as code.
LLMs can now plan against objectives (poorly) and carry on an engaging conversation - even be Sensible, Specific, Interest and Factiual (SSIF). What would a game world populated by AI's be like? If “The Sims” and the engagement with AI girlfriends, AI celebrities, and AI therapists are any indication, it would be wildly fun.
What if the next generation of entertainment is personalized generations? If one like to look at pictures of “cats where they shouldn't be,” let's generate them. In an era where one can increasingly produce any media (images, audio, video, memes), mass personalization feels within reach.
Legacy enterprise software without rich APIs is an enduring reality. Updating, integrating, and automating the fragmented set of software companies need to operate is risky and expensive. Thus, too many people toil away at Sisyphean tasks of data entry and digital paper-pushing.
For a few years, Robotic Process Automation (RPA), which promised to automate manual digital work by operating as a programmable “human user” bot in software, captured the attention of enterprises and of hopeful automation-minded nontechnical folks everywhere (“citizen developers”). However, this was a mirage, and RPA has underdelivered. These bots are expensive to design and brittle; they are more often broken than functional, end up requiring centralized development by IT in “centers of excellence” and 6:1 services:software spend in order to implement. They are band-aids, not the future.
Process discovery (analysis of ERP logs to baseline, benchmark and identify business process gaps) was an improvement on manual observation and interviews, but required bespoke implementation and provided an incomplete picture.
There is opportunity to build next-generation products that are robust and useful on both the analysis/documentation and automation fronts: LLMs can increasingly document actions, take diverse inputs (user events, logs, DOM, code, natural language policies), plan actions, use software tools tools, choose APIs and generate code.
Sales reps today spend hours on active research to narrow down long lead lists to high-likelihood targets, but still end up wasting an enormous amount of time in outreach and on calls with companies that, for one reason or another, aren't the right fit. Existing services in this space, like ZoomInfo and 6sense, provide answers to basic questions like how large companies are, where they operate, and some small set of pre-selected intent signals. We think the lack of personalization leaves a lot of signal on the table which would allow sales reps to be more efficient and effective.
To help illustrate the need, imagine being a sales rep for the latest and greatest new database which just launched a brand new Go client. Knowing who in your pipeline uses Go in their stack would enable much more effective outreach. Knowing which of your competitors prospects currently use would enable much more personalized, and therefore effective, outbound. And knowing who at the company works on the infrastructure team, or led their latest migration effort, would enable targeting down to individuals at the organization. But today, there's no better way to enable that than for sales reps to spend hours combing through company blogs and social media posts.
We think a startup that built directed crawlers & infrastructure to ask arbitrary questions (from natural language) would enable sales reps to research and prioritize from amongst a large set of prospects much more effectively. Historically, companies could do no better than simply surfacing structured data that had to be manually parsed from each page, but language models allow for much better generalization! We’re excited to talk to startups working in this space or with a unique insight into how sales teams will adopt this kind of tool into their workflow.
Video is a major social, informational, educational, and marketing medium, and the fastest growing. Digital video ad spend is projected to rise 17% in 2023 to $55 billion (per IAB). However, production of “commercial” video remains prohibitively difficult and expensive. Short form, simple commercial video can cost $1,000 to $50,000+ to produce from start to finish, and the majority of commercial video is created by agencies and professionals.
Demand dramatically outstrips “supply” of video production. Only ~3,000 brand advertisers globally create video ads, but there are 250M video creation and editing web searches per year in English.
AI will revolutionize and democratize video production, editing, personalization and understanding. Video is a challenging frontier of AI research; it is computationally costly, there's limited input data, we are still figuring out how to ensure temporal consistency, and it deserves new interfaces for control. But the frontier is advancing rapidly, and we're interested in companies that both push that frontier and cleverly leverage these technology in usable products today: from indexing/semantic understanding, to captioning and translation, to style transfer, to generated backgrounds, avatar and even product videos from 3D models, there's a treasure trove of technical capability. The product opportunity (to cleverly cross the usefulness chasm with the capabilities we already have) is equally important.
In every working enterprise organization, there are a number of roles that have infrequent but “large swing” utility. Think of, for example, the role of compliance, legal, or security in most companies. Adding the coordination cost of getting every PRD reviewed by someone on the compliance team often doesn't seem like it's worth the loss of momentum & velocity for the product team, until a months long effort gets killed late in development by a fundamental compliance failure. In response to similar challenges, as part of the shift left seen in security in the past couple years, an increasing number of teams have appointed “security champions” whose responsibility is to represent the interests of the security team more broadly.
We think very lightweight “agents” that represent the point of view of organizations, people, or even individual documents can offer a solution to these kinds of challenges. The implementation can range from automatic document editing & commenting (i.e. “consider encryption strategy), to Slack channels for developers to ask what someone from another organization would think of their approach. These agents would both help distribute knowledge across the teams while also freeing up the core legal/compliance/security team to take on larger initiatives and do more focused work.
We think of these agents as the next step from “chat your document” style use cases that capture a more specific enterprise workflow. We're excited to talk to folks working on similar problems or that have unique points of view for where and how to integrate!
Managing the security operations center (SOC) is a constant pain for CISOs. They are trapped between the need to write highly specific alerting rules and automations, and the burden of handling too many false positives with too few skilled analysts.
We believe that application of LLMs can help scale incident resolution. Models can be used to generate and update rules, automations and even integration code into security tools. They can be used to deduplicate alerts and identify false positives by using agents to actively exploring possible causes (e.g. examine logs and find error code for an expired password, explaining an authentication failure). A copilot can suggest and automate investigation steps (e.g. query generation), as well as retrieve context, summarize events, handle ticketing and produce documentation.
There is a massive opportunity to reimagine the SOC, reducing operational toil and increasing effectiveness by leveraging model reasoning. What is a large SOC if not a mixture of experts?
Language models benefit a great deal from access to "reliable web data" -- knowledge bases offer explicit checks against hallcuination, especially when combined with some research driven methods of revising (e.g. Gao et al 2022, Peng et al 2023). They also allow for citations to externally verifiable material, which are valuable both to build user trust and also to expand on first answers with reliable source material.
However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but, at least at first glance, would also be extremely helpful for many of the personal assistant style applications we can think of.
Another variant of this problem is in systematic crawl and parsing -- today, companies sign one of contracts to crawl & parse a pre-negotiated set of fields through third party providers. A modern crawl company could offer those sets of data, along with the orchestration to ask arbitrary questions of that dataset (i.e. "on each page that discusses AirPods, what's the sentiment?"). We think this power will be useful not just in e-commerce, but in a wide variety of other use cases, like pharmaceutical companies looking to gather data on side-effect frequency or market research firms assessing the success of a new product launch.
LLMs have the potential to transform financial and accounting software from databases to context-aware, proactive processors. These models could shift the human expert's role from manual “rules engine” to strategic oversight.
The initial success of domain-specific models such as BloombergGPT on financial NLP tasks (such as ConvFinQA), the “code interpreter” approach to increasing accuracy of calculations, as well as early research results of using specialized LLMs for tasks such as transaction classification are all encouraging.
We think this is a technically rich and commercially valuable application area: requiring robust interactions with PDFs and tabular data, increased domain-specific reasoning, task-specific research and engineering, and definite need for workflow product beyond the chatbox. From a data perspective, we’re particularly excited that global accounting, tax, financial reporting and compliance standards are all codified in natural language, with corresponding large crawl-able datasets of compliant examples. Some tasks that could be interesting starting points:
For many of the managers and team leads we know, one of the largest time sucks is summarizing the activities of their reports for different stakeholders in their organization. For example, a tech lead for a home page redesign needs to communicate slightly different messages to the VP of engineering, the marketing team, the customer support team, and the CEO. In practice, much of this communication happens asynchronously (or not at all), leading to tedious work in writing reports and lost context across organization lines. This pain is felt up and down the organization to, as reports need to (pardon the pun) report their work as well!
We think large language models offer an interesting opportunity to alleviate much of this pain. If connected to the primary places where employees do work (Github for engineers, Notion/Coda for PMs, etc.), a tool could automatically capture what work an employee has done recently and communicate that (with appropriate context) to stakeholders. For example, a marketing manager might not care about the specifics about why the home page redesign is delayed, but they're certainly affected by a change in timeline. Style shift to different roles is well within the capabilities of language models today, and access to source of record of where work is completed allows for an audit trail as well.
In the longer term, we also believe this function could offer better organizational pictures about where time and effort is being spent and allow decision makers to better understand and influence direction.
A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.
Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.
The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.
A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.
There are many (promising!) startups working on solving customer support problems, beginning with a common set of simpler use cases, like processing returns on e-commerce sites or basic questions about planning on travel sites. We think that this is a large and promising market, but also believe there is a unique and new opportunity to target a more challenging & sophisticated set of "technical customer service" requirements i.e. issues with MongoDB, Databricks, Github, etc.
These issues are currently extremely expensive for companies to deal with, often requiring staffing (multiple!) full time engineers to support or "forward-deployed" roles. And, existing customer support solutions are unlikely to support this workflow; in order to solve for the technical support use case well, we think a startup would likely have to do multiple of the following:
An early version of this product might serve as a "debugging copilot" for the engineers currently working in that role, and, over time, enable them to spend more of their time actively building and deploying new product as opposed to purely on customer support. We also think it's possible targetting the top end of this market would lead a startup to build rigorous infra and eval that would enable them to serve the more traditional (i.e. less technical) use cases as well.
We're excited to talk to folks working both on the technical and more general variants of this problem!
Foundation models have been increasingly multi-modal; we started with text, then image, and now there's a whole host of applications from singal processing to video generation. One class of models that has appeared to be consistently challenging, however, is generated 3D models, specifically with high enough fidelity to be used in precise end applications (i.e. construction, manufacturing, etc.)
Beyond simply meeting some set of requirements that user specifies, these generated 3D assets have demanding precision requirements, sometimes down to the millimeter, must be able to manufactured, and often have a complex optimization space. These are often processes that are still difficult for experts, much less AI models that still don't have a real understanding of "physics".
We think this is a problem worth tackling despite this challenge, for a couple reasons: first, the vast majority of the revenue of AutoDesk, one of the largest CAD players, comes from their engineering & construction division ($1.1 billion in 2021); second, we think it's possible to build assisstive tooling that either helps engineers narrow down their design space more efficiently or performs some set of menial tasks for them as a starting point; finally, we're optimistic that combining generative models (i.e. NeRFs, Dreambooth) with work to "clean up", like simulation for validation and other post-processing, can reduce the burden on zero-shot model output.
Software systems continue to grow in complexity, and observability data grows in lockstep. The burden on DevOps teams is ever-heavier, but observability remains a stubbornly difficult domain for AI. Therein lies the opportunity: lack of semantic meaning in metrics/trace data, the difficulty of adapting transformers to learn long-term temporal dependencies, handling dynamic structure and high dimensionality, extracting meaning from high cardinality logs without overfitting, granularity variations, and sparsity of important events are just a few of the big technical challenges.
A full-scale effort to train models that understand context from logs, metrics, traces, tickets, code, deployments, docs and discussions (over time!) is a worthwhile and ambitious mission. We think the next generation of observability is human-readable, proactive, and less noisy. We are encouraged by strides in related fields (e.g. LLM-based security alert triage) and believe that increasing model understanding of code is a breakthrough for understanding systems that…rely on code.
Things that are clearly possible:
The medium-term future is even more promising. One can imagine agents that process an anomaly, contextualize it, prioritize it, generate a hypothesis about it, and test fixes. Can a model figure out if the issue in production is due to a database connection pool exhaustion, a DNS issue, or an intern pushing code? Only one way to find out.
Marketing teams have generally been run as siloed organizations, with individual sections of the team owning different channels and the production of content for those channels (e.g. Facebook, Instagram, Google, Youtube, etc.) Historically, this has made sense -- each audience is different and the same core message has to be adapted to the individual preferences of those communities/formats. However, this leads to natural inefficiencies, for example:
Language models offer a clear solution! Style shift across mediums is wellw ithin the abilities of foundation models, if not zero shot, then certainly with some amount of fine-tuning. In a similar vein, a single unified model also allows teams to generate and leverage a central data set by testing messaging more uniformly across their platforms.
Beyond automation, we also think there are exciting opportunities in dynamic personalization and better optimization, and we're always excited to hear from folks who understand how marketing workflows better than we do.