Posts

Archetypes of LLM apps

What businesses are actually doing with AI
Archetypes of LLM apps

I recently returned from a trip to San Francisco. While there, I presented to the innovation group of a large insurance company about how startups are applying AI.

This post shares that presentation I gave. In addition to the written presentation, I've recorded an audio version of it here, too.

Listen to a recording of the presentation on Apple PodcastsSpotifyOvercastPocket CastsCastboxGoogle PodcastsAmazon Music, or many other players.


Archetypes of LLM apps

We all know that ChatGPT can write essays, suggest travel itineraries, and draft emails. These tasks are powerful, but only scratch the surface of what LLMs can do.

In the past year, a wave of startups has emerged, leveraging AI to rethink industries ranging from software development to operations to marketing. These companies are building entirely new categories of AI-driven tools, seeking to disrupt current businesses.

In this presentation, I’ll explore the emerging archetypes of LLM-powered apps: the core techniques, architectures, and approaches shaping the next generation of products.

I’m Philip, and I write a blog called Contraption Company. For the past two years, I've been the CTO of Find AI, a startup building a search engine for people and companies. We’re pushing OpenAI’s technology to its extremes—making over one hundred million requests this year alone—and uncovering innovative ways to apply its power. Today, I’ll share some of those lessons and ideas with you.

Goal: Understand what people are actually doing with LLMs

By the end of this presentation, my goal is for you to understand how businesses are applying LLMs in practice. With this toolkit of patterns, you’ll be able to identify opportunities in your own company to improve efficiency with LLMs and decide whether it makes sense to build or buy solutions.

1. Building blocks, 2. Basic applications, 3. Advanced applications

Well go through three parts in this presentation, from basic to advanced.

In part one, we'll review building block technologies - like chat, embeddings, semantic search, fine tuning, and some non-LLM tools.

In part two, we'll look at basic applications of LLMs that power most startups - such as code generation, text to SQL, summarization, advanced moderation, text generation, analysis, intent detection, and data labeling.

In part three, we'll review advanced applications of LLMs that represent more frontier applications: retrieval-augmented generation (RAG), agents, and swarms.

1. Building blocks

First, let's review foundational "building block" technologies that power LLM apps.

Chat: Input text → LLM → Output text

Chat lies at the heart of most LLM applications. As we review advanced techniques like “intent detection” and “retrieval-augmented generation,” the underlying interface is still chat: input text is processed by an LLM, which generates output text.

Input text: Includes both instructions and user messages; billed for length; can fit up to 10 books into context

Input text typically consists of both instructions and user-provided data. Hosted model providers like OpenAI and Anthropic charge for the length of the input text, which is measured in “tokens.” The size of the input text varies by model.

Currently, Google’s Gemini 1.5 Pro model offers the largest input capacity, handling up to 2 million tokens—roughly equivalent to the text of 10 books. For example, it can process the entire Harry Potter series in its input and perform tasks like generating a chapter-by-chapter summary of the spells used. However, it’s important to note that recall isn’t perfect, and including large volumes of context can sometimes reduce performance.

LLM: Standard models like GPT-4o, Calude, LLaMA; "Mini" versions about 1/10 cost but not as smart; "Temperature" controls randomness

The primary models in use today are OpenAI’s GPT-4o, Anthropic’s Claude, and Meta’s LLaMA. All of these models generally offer comparable performance.

Smaller, more cost-efficient models, such as GPT-4o-mini, are also available. These require less computational power, enabling higher throughput on the same hardware. As a rule of thumb, these "mini" models typically cost about 1/10th as much as standard models, but are less accurate.

LLMs include a “temperature” parameter that developers can adjust for each request. This parameter controls the randomness of the output: higher temperatures produce more creative responses, while lower temperatures yield more predictable results.

Output text: Billed for length; Length limited to ~14 pages of text; Can return computer objects (JSON)

LLMs output text. Hosted providers charge for the length of the output text. But, most output text is limited to about 14 pages of text. So, output length tends to contribute far less to overall costs than the input length.

While we typically think of output from LLMs as plain text sentences, they can also return structured data using formats like JSON. Providers such as OpenAI have introduced tools to enforce specific output formats, ensuring reliability and accuracy. This capability allows you to transform unstructured data into structured formats or request multiple data points in a single call, streamlining tasks that would otherwise require separate queries.

Pricing: Self-hosting is expensive and tedious; Hosted models (OpenAI, Anthropic): Good benchmark is $0.01 per call of standard model, but pass in 10 books (max context) and cost ~$2.50

Among the major model constructors today, OpenAI and Anthropic provide hosted solutions, where the companies manage the infrastructure, and you pay per request. In contrast, Meta’s LLaMA is open-source, giving you the flexibility to run it on your own servers.

Based on our experience using OpenAI’s GPT-4o at Find AI, a useful mental model is that a typical LLM call costs around one cent, assuming standard input and output sizes. However, if you process a large amount of data—such as the full text of all the Harry Potter books—the cost can rise to approximately $2.50 per call.

Hosted model providers offer enterprise-grade support. For example, Microsoft can deploy a dedicated instance of an OpenAI model for you, ensuring privacy, HIPAA compliance, and other enterprise requirements.

Self-hosting a model involves significant complexity, requiring you to forecast capacity, deploy and manage servers, and optimize request routing. Due to these challenges, many businesses rely on vendors to handle these tasks, further blurring the line between hosted and self-hosted models.

For context, one H-100 GPU, often considered the workhorse for high-performance AI workloads and recommended for models like LLaMA, typically costs around $2,500 per month on a cloud provider.

Embeddings convert text into arrays of numbers

The next building block is embeddings. Embeddings are algorithms used in LLM applications, though they are not themselves LLMs. They convert text into numerical representations that capture its underlying meaning, enabling us to measure the relatedness of text using mathematics.

Embedding algorithms transform text into vectors, which are essentially points in a multi-dimensional space. These vectors encode meaning as a series of numbers, allowing us to determine how similar two pieces of text are based on their proximity in this space.

OpenAI offers some of the most advanced embedding algorithms available today. Their most advanced model returns 3,072-dimensional vectors, can process inputs in multiple languages, and are widely used to extract and compare textual meaning. However, there are many different embedding algorithms, and it’s crucial to use the same algorithm consistently across your text for accurate results.

Vectors from embeddings can be graphed, and distance from other points represents relatedness - so "Cat" and "Dog" would be closer together than "Sandwich"

By measuring the distance between points, we can determine how closely related different concepts are. For example, “cat” and “dog” are closer to each other than “sandwich,” reflecting their greater similarity in meaning. LLM applications leverage embeddings to enable searches based on semantic meaning rather than just keywords.

Semantic search is meaning-based. Traditional search uses Levenshtein distance, which handles misspellings well. Semantic search uses embeddings and cosine distance, and can relate words based on meaning.

Historically, search applications have relied on keyword-based approaches to find relevant text. Tools like Elasticsearch and Algolia use this traditional method, often employing algorithms such as Levenshtein distance to measure relatedness. This approach works well for locating exact or similar keywords—for example, searching “dog breeds” might return “list of dog breeds.” However, it might miss relevant results like “poodles” and mistakenly include irrelevant ones like “hot dog ingredients.”

Semantic search represents a new generation of search technology, widely used in LLM applications. Instead of focusing on keywords, it evaluates meaning by measuring the cosine distance between embedded vectors. With semantic search, a query like “dog breeds” would correctly identify “poodles” as relevant while excluding “hot dog ingredients.”

As you explore LLM applications, it’s important to understand that semantic search is a foundational technology powering many of them.

Vector databases power semantic search - pgvector, Pinecone, and Milvus are major ones.

As semantic search becomes integral to many LLM applications, specialized databases for storing and searching vectors are gaining traction. Some options, like pgvector, are free and open source, serving as an extension to the widely used PostgreSQL database. Others, such as Pinecone and Milvus, are standalone vector databases designed specifically for this purpose.

Storing vectors can be resource-intensive because they don’t compress well, and maintaining fast search speeds requires computationally expensive algorithms.

At Find AI, we initially implemented semantic search using pgvector alongside our application data. However, we found that 90% of our disk space and 99% of our CPU were consumed by vector calculations, resulting in slow performance. Eventually, we transitioned to Pinecone's managed vector database optimized for this workload. While it significantly improved performance, it also became more expensive than our primary application database.

A takeaway is that there are infrastructure costs to running LLM applications beyond the LLMs themselves, and these can be substantial.

Fine tuning: Take existing model, and train it further; Most common use case: Train cheap models to do one task as well as an expensive model; Slow and costly, but valuable at scale

Fine-tuning is an important concept in working with LLMs. It allows you to take a pre-trained model and further train it for your specific use case. This can be done with both hosted and self-hosted models. One common approach is to fine-tune a less expensive model to perform a specific task at a level comparable to a more costly model.

However, fine-tuning comes with significant trade-offs. The process is often slow and expensive, and it can be difficult to assess whether fine-tuning has introduced negative impacts on the model’s performance in other areas. For these reasons, I typically recommend avoiding fine-tuning until you have a mature AI program. It’s better thought of as a scaling tool rather than a starting point for developing AI applications.

Other tools: Moderation, Voice, Images, and Batch

As the final part of the “Building Blocks” section, I want to highlight a few tools provided by Anthropic and OpenAI that, while not LLMs themselves, can play an important role in LLM applications.

Moderation: Both OpenAI and Anthropic offer advanced moderation APIs that can review text and flag potential safety issues. These tools are sophisticated enough to differentiate between nuanced phrases like “I hope you kill this presentation” and “I hope you kill the presenter.” Many LLM applications integrate these moderation endpoints as a preliminary step before executing application logic.

Voice: Speech-to-text and text-to-speech technologies have become quick and reliable, enabling most text-based applications to be seamlessly adapted into voice-based ones. It’s worth noting, however, that most voice-driven LLM applications work by first converting voice to text and then using the same text-based LLM tools discussed here. Essentially, it’s just a different user interface.

Image generation: Image generation has advanced significantly and is a powerful tool often used alongside LLMs. While not directly powered by LLMs, it complements many AI-driven applications, expanding their functionality.

Batch processing: Hosted model providers like OpenAI offer discounts—up to 50% —if you allow a 24-hour turnaround for requests instead of requiring immediate responses. This can be particularly useful for background tasks, such as data analysis. By taking advantage of batch processing, you can dramatically lower costs, especially for tasks that don’t need real-time results.

2. Basic applications

Next, we'll review some basic LLM applications.

Code generation: "Write a python function that tells me whether a number is prime"

The first major archetype of LLM applications is code generation. LLMs excel at tasks ranging from generating basic functions to making contextual modifications across multiple files and even building full-stack features. By analyzing multiple files as input, these models can maintain consistency and streamline development workflows.

Code generation: Github Copilot and Cursor are leading tools, and Google says that 25% of its code is now written by AI.

The most prominent tool in this space is Microsoft’s GitHub Copilot, with over one million paying customers. Another example is Cursor, a code-writing tool with integrated AI capabilities that can generate code, develop features, and perform semantic search across codebases. It’s incredible. Even Google reports that 25% of its new code is already being written by AI.

AI-powered code generation has brought a step-function increase in productivity, making it an essential tool for developers. As one CTO of a billion-dollar company told me, “Developers who haven’t adopted AI are now considered low-performers.” While we’ll explore various startups and tools in this presentation, I want to emphasize that AI is no longer a “future” tool in software development—it’s already the standard.

Text to SQL: "Write a SQL query that calculates a cost per claim, grouped by year and zip code"

A notable category of code generation is text-to-SQL. AI excels at generating database queries, making it possible for even non-technical users to eailiy ask questions from data stores and warehouses. LLM models can analyze available data structures, including tables and columns, and generate complex, advanced queries. I rely heavily on AI for SQL queries, and there have been instances where it produced queries I initially thought were impossible.

Text to SQL is replacing filter-based UIs

Text-to-SQL can even improve customer-facing applications. Traditional filter interfaces—commonly used to narrow data in tables via dropdowns, typeaheads, and tags—are a staple of CRMs, customer support tools, and similar platforms. These interfaces work by generating SQL queries behind the scenes to retrieve results.

With AI, these cumbersome filter-based UIs are being replaced by natural language input. Users can now enter queries like “Companies in the USA with 50-100 employees,” and the AI automatically generates the appropriate SQL query, eliminating the need for complex and bloated interfaces.

Summarization: "Read this webpage and give me a summery in 3 key bullet points"

Summarization is one of the core strengths of LLMs. By providing text, you can receive concise, high-quality summaries. Summarizations can also be structured, such as condensing a news article into a tweet or transforming a historical article into a timeline.

Summarization example: An email newsletter generated by code.

Here’s an example of an email newsletter created using my software, Booklet. It analyzes new posts and discussions in a community and generates all the content automatically. The subject line, titles, and summaries in the email are all generated by AI. This newsletter is sent to thousands of people daily—completely automated, with no human intervention.

Advanced moderation: "Our public forum rules disallow questions about billing, or any posts that include PII. Analyze this post and tell me if it is allowed:"

Earlier, I mentioned that model providers offer free safety-focused moderation tools. However, LLMs can also be leveraged to build more advanced, rule-based moderation systems. For example, in a customer support forum, you can provide the community rules to an LLM and have it review posts to ensure compliance. These automated community management systems are quick and reliable.

Interestingly, most moderation applications also prompt the LLM to provide a reason for its judgment. Asking the model to explain its decisions not only adds transparency but often improves its accuracy.

Text generation: "Write a help doc about how to file a reimbursement for a child. Use the articles we already wrote about how to file for a reimbursement, but make a more specific version for this particular use case."

The next archetype is text generation, where LLMs excel at creating new content. One particularly effective use case is combining two existing documents into a cohesive new one. For instance, if you have a document titled “How to File a Reimbursement” and another titled “How to Add Your Child to Your Account,” you can prompt an LLM to generate a new article, such as “How to File a Reimbursement on Behalf of a Child.”

Text generation: Loop Genius, Copy.ai and Jasper are leading products in this space - but Google is becoming crowded by AI content.

Text generation is a key feature in many marketing startups. For example, LoopGenius leverages LLMs to automatically generate, test, and refine Facebook ads. Tools like Copy.ai and Jasper focus on creating content for marketing pages, helping businesses improve their SEO strategies.

However, AI-generated content is flooding the internet. It’s now easier than ever for companies to add millions of pages to their websites, leading to an oversaturation of material. As a result, it’s likely that Google will adapt its algorithms to address the proliferation of AI-driven content.

Analysis: "Here is my job description, here is a candidate resume - tell me whether the candidate matches all requirements"

The next archetype is analysis, where LLMs can evaluate data and provide decisions. For example, you can ask ChatGPT to compare a job description and a resume to analyze whether a candidate is a good match for the role—and it performs this task remarkably well.

Analysis startups include Find AI and Applicant AI

At Find AI, we also leverage analysis. When you run a search like “Startup founders who have a dog,” the system asks OpenAI to review profiles one by one and determine, “Is this person a startup founder who has a dog?”

Currently, recruiting is one of the most common use cases for analysis. Many companies rely on AI for initial applicant screening, significantly streamlining the hiring process. Applicant AI is one example, but many similar tools are emerging in this space.

Intent detection: "We have 3 customer support departments. Given a user query, which department should I connect them to?"

Intent detection is one of my favorite applications of LLMs. We’ve all encountered traditional phone menus that say, “Press 1 for new customer enrollment, press 2 for billing, press 3 for sales,” and so on. AI can replace this process by simply asking, “Why are you calling?” and then routing the caller to the appropriate department. This technique, where AI maps a user’s input to a predefined set of options, is known as intent detection.

Intent detection: Allows LLMs to navigate a decision tree

Intent detection is a foundational technique widely used in more advanced AI applications because it enables systems to navigate decision trees. Many customer interactions are essentially a series of decisions, and LLMs can make these processes feel seamless by converting them into natural language exchanges. At Find AI, for example, every search begins with an intent detection step, where we ask the LLM, “Is this query about a person or a company?”

Intent detection startups include Observe AI and PolyAI

Call centers have been early adopters of intent detection, integrating it into customer support and sales workflows. Companies like Observe AI and PolyAI are reimagining these functions with solutions that blend the strengths of LLMs and human agents.

Data labeling: "Analyze the below customer chat conversation, and apply any labels that apply from list"

LLMs are increasingly being used in analytics for data labeling, a critical task in tools like customer support and sales systems. Tags and labels help track things like feature requests or objections, tasks that previously required customer support agents to spend significant time manually tagging conversations. Now, LLMs can automate this process entirely.

This capability is particularly useful for analyzing historical data. For example, you could instruct an LLM to review all past customer conversations and identify instances where a company requested an API.

At Find AI, we use LLMs to label every search after it’s run, applying tags like “Person at a particular company” or “Location-based search.”

Data labeling also pairs well with the Batch processing capability discussed earlier. By allowing up to 24 hours for a response, you can significantly reduce costs while efficiently processing large volumes of data.

Data labeling startups include Snorkel and Scale

Building LLMs required massive amounts of human-labeled data, leading to the rise of companies over the past decade that specialize in data labeling, such as Scale AI and Snorkel AI. Interestingly, many of these tools, which were once entirely human-driven, have now evolved to incorporate both AI and human-based labeling systems. As a result, there is now a robust ecosystem of reliable tools available for data labeling, combining the efficiency of AI with the precision of human input.

3. Advanced applications

In the final section, we’ll explore advanced applications of LLMs, focusing on complex and cutting-edge techniques at the forefront of AI development.

Retrieval-augmented generation (RAG) are LLMs that can retrieve data. Input text → Retrieved info → LLM → Output text

The first advanced technique we’ll cover is retrieval-augmented generation (RAG). This approach enables an LLM to retrieve relevant information to improve its responses. After a user inputs a query, the LLM retrieves specific data, feeds it into the model, and generates a more accurate output.

Use case is to add knowledge, like help docs: "I have a question" → "+ Help docs → LLM → Answer

A common use case for RAG is improving help documentation. For example, if a user asks, “How do I submit an expense report?” we want the LLM to access relevant documents about expense reporting to provide the correct answer. However, including all help docs in every query would be prohibitively expensive, and overwhelming the context with too much information could decrease accuracy.

The solution is embeddings

The goal of RAG is to retrieve and include only the most relevant documents—perhaps two or three—to assist with the query. This is achieved using the foundational technologies of embeddings and vector databases

To retrieve info, you can take help docs → embed paragraphs → add to a vector database, then embed input text to look up relevant information

Here’s how most RAG applications work: beforehand, all data (such as help docs) is broken down into smaller chunks, like paragraphs. Each chunk is embedded and stored in a vector database. When a user asks a question like “How do I file an expense report?” the system retrieves only the most relevant articles from the database. By feeding this targeted information into the LLM, RAG enhances the response.

Retrieval-augmented generation is everywhere

RAG is foundational to many LLM applications today because it allows companies to incorporate unique, business-specific information into responses while keeping costs manageable. This technique is already widely used in customer support tools, such as Intercom’s chatbots, and powers other AI-driven applications like Perplexity AI.

In many ways, RAG is the core method businesses use to tailor AI systems to their specific logic and needs.

Agents are LLMs with access to tools (credit: Quick Start Guide to Large Language Models by Sinan Ozdemir)

The next advanced technique is Agents, which have become a hot topic in the AI space. If you visit a startup accelerator today, you’ll likely find a dozen startups touting their agent-based solutions, many of them raising millions in funding.

The definition of an agent remains somewhat fluid, but I like the one from the Quick Start Guide to Large Language Models: an agent is an LLM with access to tools. These tools define the agent’s functionality and can, in theory, be anything.

ChatGPT is an agent with four tools - Bio, Dall-E, Python, and Web

The most popular agent today is ChatGPT. If you ask ChatGPT about the tools it has access to, it will list: Bio for memory, DALL-E for image generation, Python for executing code, and Web for internet searches. This is also the key difference between ChatGPT and the OpenAI API: these four tools are not available to API users.

Developers can write tools for agents with code - such as retrieving data, submitting an expense report, or issuing a refund. Function calling libraries make it easier.

Developers can create tools for agents using code, enabling a wide range of functionalities—from retrieving data and submitting forms to processing refunds. These tools can incorporate user-specific context and include safeguards and limitations to ensure proper usage.

Agents can even use computers: scroll, click, log in, fill out forms, and look up information. Anthropic's Computer Use product simplifies this.

LLMS can now even interact with computers, extending their capabilities beyond traditional tasks. Robotic Process Automation (RPA) has long allowed developers to automate actions like browsing websites or performing operations. However, agents are taking this further. For instance, Anthropic’s new Computer Use feature gives LLMs a computer, allowing them to performing tasks such as web browsing, clicking buttons, and responding to error messages.

This advancement has significant implications. Compared to traditional RPA tools, agents are less fragile and far more adaptable, making them better suited to dynamic and complex workflows.

Agents are the edge of AI today, with companies like Veritas Labs, AiSDR, and Cognition leading the space.

Agents represent the cutting edge of AI today, with startups equipping LLMs with a wide range of tools to tackle complex tasks. Veritas Labs is developing agents to automate healthcare operations and customer support. AiSDR has created a virtual salesperson that autonomously finds leads, sends emails, responds to customer inquiries, and schedules meetings. Meanwhile, Cognition AI has introduced Devin, touted as “the world’s first AI software engineer,” capable of accepting tasks and writing the code needed to complete them.

Agents are pushing the boundaries of LLM technology, enabling some of the first fully autonomous LLM applications.

Swarms are AI agents that can collaborate

The final advanced application I want to discuss is the concept of Swarms—AI agents that collaborate to achieve a shared goal. OpenAI introduced this idea, along with the name, through an open-source project called “Swarm.” The core concept is to have a team of specialized AI agents that work together, each focusing on specific tasks.

An example Swarm for an expense report: User → Expense report submission agent → Expense report approval agent (accesses past reports, and can message dinner attendees to confirm attendance) → Reimbursement agent (connects to bookkeeping)

For example, imagine a swarm designed for handling expense reports. One agent could guide users through submitting expense reports, another could review and approve them by accessing relevant data (like past reports or messaging team members), and a third could handle reimbursements, including sending payments and updating bookkeeping. By dividing tasks among multiple agents, you can enhance safety and control—such as ensuring the expense review agent only processes documents and doesn’t access subjective information from the submitter.

Swarms are the (near) future of Gen AI applications

Swarms represent the near future of generative AI applications. As agent platforms mature and standards for agent collaboration emerge, the adoption of swarms will likely become widespread, unlocking new possibilities for AI-driven workflows.

Goal: Understand what people are actually doing with LLMs

The goal of this presentation was to help you understand what people are actually doing with LLMs.

Recap: 1. Building blocks, 2. Basic applications, 3. Advanced applications

We covered building blocks, such as chat, embeddings, and semantic search. Then, we explored basic applications such as code generation, summarization, moderation, analysis, intent detection, and data labeling. Finally, we explored advanced applications - such as RAG, agents, and swarms.

Archetypes of LLM apps by Contraption Company

Understanding the archetypes of LLM applications can help you identify opportunities to improve business processes and workflows with AI. Additionally, the discussion around hosted versus self-hosted solutions, along with potential vendors, should equip you to make informed decisions about when to build versus buy and how to evaluate the sophistication of various tools.

In software engineering, AI is already the present—not the future—and I believe we’ll see this same transformative impact extend across many other functions and industries. Thank you for taking the time to explore these ideas with me today.

Americano at Sightglass Coffee in San Francisco

We recently released a Search API for Find AI, and announced its integration with Clay.com.

APIs allow developers to interact with a product using code. There are many different ways to build an API, and many tools to make it easier for customers to adopt. In this post, I’ll take you behind the scenes of how we built the Find AI API, from its technical foundations to the tools we used to simplify developer adoption.

I'll start with the end customer experience, then work our way back to the internal architecture.

Explainer video

Here's a video I created to explain how to use the Find AI API:

I recorded the video using a DJI Pocket 3 with their lavalier mic and edited it in Descript. I recorded almost an hour of footage - so I spent a lot of time editing the video to be as short and clear as possible. Descript's text-based editing tool, originally designed for podcasts, made it easy to scan through retakes and figure out which was best.

Video tutorials help people understand the end-to-end integration process of an API before diving into detailed documentation. Every customer who has integrated with our API started by watching this video, so it was a good use of time.

Generated client libraries

In the video, calling the API looks straightforward because you can install a Find AI client library and call import FindAI from "find-ai" to make requests. Client libraries save developers time by abstracting away boilerplate code and reducing the need to read extensive documentation.

We provide official client libraries in Python, Node, Ruby, and Go, making integration accessible across multiple programming languages.

Companies like Stripe and OpenAI have set a high standard, making libraries a key part of their developer ecosystems. So, developers now expect client libraries in multiple languages whenever integrating with a new API. Writing and maintaining these libraries manually, however, would be both time-consuming and error-prone.

This is where Stainless comes in. Stainless reads our OpenAPI specification and automatically generates client libraries in multiple languages. The tool was created by Alex, who built similar systems at Stripe, and it now powers libraries for OpenAI, Anthropic, and Cloudflare.

For the Find AI API, every single user I’ve spoken to relies on one of our Stainless-generated client libraries. If you’re building an API, providing robust client libraries isn’t just a nice-to-have—it’s the new standard.

Interactive docs

Clear documentation is the foundation of a great developer experience. It helps users understand how to interact with your API and what data they can send or retrieve.

Initially, we used Swagger to generate our API documentation. Swagger reads an OpenAPI spec and creates interactive docs, allowing users to input their API key and test endpoints directly. This interactivity is a fantastic way for developers to use an API before writing any code. It’s also what I use in the explainer video. Our Swagger docs are still available at usefind.ai/api/docs.

However, Swagger had limitations. Its design felt dated, and it wasn’t easy to add additional text or media to guide users through setup or multi-step calls.

To address this, we switched to Mintlify for our primary documentation. Mintlify offers the same interactive features as Swagger but provides more flexibility for customization. For example, we embedded the explainer video and added step-by-step guides to explain each function in detail.

Mintlify’s docs are now our main resource, available at usefind.ai/docs. They’re clean, easy to navigate, and SEO-friendly, thanks to an installation via Cloudflare Workers.

OpenAPI at the core

When designing the Find AI API, we opted for a RESTful architecture. While newer paradigms like GraphQL and gRPC are gaining popularity, we chose REST because the ecosystem has largely standardized around OpenAPI for documenting APIs.

The OpenAPI specification serves as the backbone of our API ecosystem. It’s a machine-readable file that defines what the API can do. When we update the spec, tools like Stainless automatically regenerate client libraries in multiple languages, and our documentation on Mintlify and Swaggerupdates automatically.

This unified workflow ensures that our API is consistent and always up-to-date for developers.

Usage-based billing

One of the key architectural decisions we made was to adopt usage-based billing. We wanted our pricing model to reflect the value provided to users. For example, if a query requests 100 matches but only 50 exist, the user is billed for 50. This ensures fairness and aligns costs with usage.

To implement this model, we used Stripe’s usage-based billing. Setting up usage tracking and integrating with Stripe was surprisingly straightforward. Customers simply add a credit card to begin using the API, and Stripe charges their credit card weekly based on their usage.

This approach has worked well for our customers and ensures a seamless payment experience while scaling with their needs.

Demo mode

Another important decision was to include a demo environment in the API. Since using the full API requires adding a credit card, we wanted to provide a way for developers to experiment with the product risk-free.

To achieve this, we allow developers to issue a demo-mode API key. This key returns placeholder data (e.g., results like example.com) without incurring any costs. It’s particularly useful for mimicking the API’s functionality in development environments.

Looking at our analytics, however, demo mode hasn’t been widely used. Most developers were comfortable testing with the production API and seemed hesitant to rely on test-mode data. If I were to rebuild the API, I’d likely skip the demo mode entirely.

Try it out

If you want to incorporate search of people and companies into your application, check out the Find AI API.

Having a TJ

The importance of a passionate first customer
La Cabra roastery in Brooklyn

Recent events have reminded me of a phrase I’ve long used in the startup world: “Having a TJ.”

Before Staffjoy became a company, it was just a side project. Our first user was TJ, whose biggest challenge was scheduling his workforce. Every week, TJ would meet with us to explain his problems. We’d show him what we were working on, and he’d provide invaluable feedback. TJ became the lifeblood of our startup—a real person with a real problem, collaborating with us to find a solution. Over time, “TJ” evolved into a metaphorical persona representing our customer base: “What would TJ want?”

Our minimum viable product at Staffjoy involved just emailing spreadsheets of schedules back and forth with TJ. Despite its simplicity—and perhaps clunkiness—he was happy to use it because we were addressing his core workforce management issues. TJ wasn’t distracted by unnecessary features; he cared about solving his problem.

With TJ’s help, we built an app, got into the Y Combinator Fellowship, raised a seed round, and helped more customers. TJ’s feedback and enthusiasm were instrumental in guiding Staffjoy from an idea into a venture we worked on for two years.

Many startups fail to secure even a single customer or create something that one person genuinely wants. Having a “TJ” keeps a company focused on solving real problems for real people. Individuals like TJ validate assumptions, offer honest feedback to prioritize work, answer spontaneous questions, and become references for future customers. They confirm that the company is tackling a genuine need. Once you’ve built something that satisfies TJ, you can seek out more customers like them.

In other companies I’ve been involved with, there’s always been that “TJ”—the first customer who has a problem, collaborates on the solution, and then champions your product. If you’re building a startup and don’t yet have a passionate user, I recommend focusing on finding that early adopter who can provide feedback. If you can’t find such a user, perhaps you’re addressing the wrong problem.

Later, as the industry shifted amid consolidations and shutdowns, TJ was laid off. Responding to the market dynamics, we pivoted, but we struggled to find another TJ and ultimately shut down. Losing a “TJ” can be a canary in the coal mine for a startup.

A passionate early customer keeps a startup team motivated and working on the right thing. Most startups focus on growth too early and fail to make something that a single customer wants. The TJ lesson is that a successful product starts with one customer, and that one customer’s love of the product is rooted in a problem they desperately want your help solving.

It's better to have 100 users love you than 1 million kinda like you. The true seed of scale is love, and you can't buy it, hack it, or game it. A product that is deeply loved is one that can scale. 
- Sam Altman

Innovation versus distribution

The race between startups and incumbents
Eiffel tower with olympic rings

Earlier this year, I attended a talk in NYC by Vinay Hiremath, co-founder of Loom. He explained a mental model that's stuck with me.

Here’s the model: When a startup competes with an incumbent, it has an innovative product but seeks distribution. The incumbent has distribution—all its customers—but seeks innovation. So, they race: the startup tries to capture the incumbent’s customers before the incumbent can develop a better product.

Sometimes, the innovator wins, such as when Google surpassed Yahoo or the iPhone overtook BlackBerry.

Other times, the incumbent prevails. In the case of Slack vs. Microsoft Teams, Microsoft Teams now reports about ten times as many daily active users as Slack. Salesforce has also stood the test of time against many innovators.

Some ongoing races include Linear vs. Jira and ChatGPT vs. Google.

To win with innovation, small companies need to be hard to copy (like Figma), have strong network effects (like Facebook), or be ignored by incumbents (such as Lyft eschewing taxi laws).

Big tech companies should not be underestimated. They have become skilled at building products and often let startups do the hard work of validating new markets before they compete. They sometimes engage in tactics that are unethical and potentially illegal, such as cloning features to stifle emerging competitors—a strategy Instagram notoriously employed against Snapchat and later TikTok. These actions often go unchecked because if the incumbent dominates the market, the startup may not have the resources or time to pursue legal action.

I often think about this model because it applies well to many markets. As a startup, you should always ask, “Can somebody just copy this?” As an incumbent, you should ask, “Are we nimble enough to keep our product competitive?” Either way, the first step to winning a race is recognizing that you’re in one.

Internal tools of Find AI

Technical presentation at an AI meetup
Manhattan rooftop during the 2024 solar eclipse

This week, I presented at the Mindstone AI meetup in NYC about internal tools we built at Find AI. We use OpenAI extensively to build a search engine for people and companies - making millions of daily LLM requests.

In this presentation, I covered two internal tools we built to improve our understanding and usage of OpenAI. The first is a semantic search engine we built on top of OpenAI Embeddings to understand the performance and accuracy of vector-based semantic search. The second is a qualitative model evaluation tool we built to compare the performance of different AI models for our use cases. These tools are internal research products that have never been shown publicly.

I recorded the presentation, which you can watch on Youtube.

Wine craft

2024 harvest season in Alsace
Rainbow in Colmar, France

Earlier this month, I traveled to the Alsace wine region of France to explore the craft of wine. Their harvest season had just officially kicked off, so winemakers were beginning to pick grades and produce their 2024 vintage.

I love finding people that focus on mastery of one skill. Winemaking is one of the classic crafts, and the Alsace region is a historic region filled with tradition. Many of the winemakers came from a multi-generational lineage of producers.

Even amid the tradition and rules, I saw innovation. In a region known for its white wines, four producers had successfully lobbied for the government to award grand cru designations to their Pinot Noir wines. I visited some of these producers and felt their renewed sense of autonomy.

I brought a DJI Pocket 3 camera to document the visit and turned my footage into a little video about a day in Alsace. Take a look:

Watch the video on Youtube.

How I use data to optimize AI apps

A video collaboration between Find AI and Velvet
Flower shop in Paris

At Find AI, we use OpenAI a lot. Last week, we made 19 million requests.

Understanding what's happening at that scale can be challenging. It's a classic OODA loop:

  • Observe what our application is doing and which systems are triggering requests
  • Orient around what's happening, such as which models are the most costly in aggregate
  • Decide how to make the system more efficient, such as by testing a more efficient model or shorter prompt
  • Act by rolling out changes

Velvet, an AI Gateway, is the tool in our development stack that enables this observability and optimization loop. I worked with them this week to produce a video about how we use data to optimize our AI-powered apps at Find AI.

The video covers observability tools in development, cost attribution, using the OpenAI Batch API, evaluating new models, and fine-tuning. I hope it's a useful resource for people running AI models in production.

Watch the video on the Velvet Youtube.

Is fractional work the future?

A conversation with Taylor Crane
Soho House Copenhagen

Today, I'm sharing a conversation with Taylor Crane, founder of FractionalJobs.io. Fractional work, loosely defined as "ongoing part-time engagements," has been a growing trend in the technology industry.

The label "fractional work" is relatively new, but I've been interested in part-time work for years. In 2016, I built Staffjoy using part-time contractors. In 2017, I founded Moonlight to help companies hire part-time contractors. Last year, I launched the FRCTNL community for part-time tech workers. Today, my current company, Find AI, has an official fractional work program and works with five fractionals.

In this conversation, Taylor and I discuss:

  • Why companies hire part-time workers
  • What fractional workers do with the rest of their time
  • Productivity and whether 40 hour/week employment applies to knowledge work
  • Whether junior workers should pursue part-time work
  • How tech companies may structure themselves in the future to take advantage of fractional workers

Watch on Youtube. Listen to a recording of this conversation on Apple PodcastsSpotify, or other podcast players.