LLMs in Production: Key Takeaways For Founders

Photo Credits: Obed Obwoge

As large language models (LLMs) rapidly advance, more and more companies are looking to leverage their power in real-world applications. But the path from an impressive demo to reliable production is filled with challenges around data security, performance, cost, and trust.

To unpack these issues, Flybridge recently co-hosted an event with our portfolio company Portkey at the offices of Gunderson Dettmer, bringing together expert practitioners to share their hard-earned lessons from the field. This summary brings you into the conversation - if you have questions or want to be sure you’re included in Flybridge events send a note to hello@flybridge.com.

Speakers

Panel Discussion

Company Demos

Key Insights

Prioritizing Data Security and Privacy

A major theme was the heightened focus on data security and privacy when working with LLMs. As Aaron Rubin, Partner at Gunderson Dettmer noted, "There's a renewed focus on what service providers are going to do with data. This focus on, “are you going to use it to train your model or not?”, has been sort of the evergreen question over the last nine months to a year."

The idea of synthetic data and where that’s going also factors into how companies are building and training models with Chris Brown from NVIDIA adding, "That's been an area where more people have been trying to dive in to figure out how do we build stuff here where I don’t have all of this existing proprietary data, and I want to build something out of it."

Strategies shared by the panelists to address this included:

  • Deploying models on-premises rather than in the cloud

  • Contractually committing not to use customer data for model training

  • Pursuing security certifications like SOC 2 Type II, ISO, and NIST

  • Licensing high-quality third-party datasets to augment model training

Fine-tuning for Performance and Cost

While foundation models like GPT-4 are highly capable out of the box, many practitioners are finding success with fine-tuning smaller models for specific tasks. Rohit Agarwal, co-founder & CEO of Portkey shared, "The best way is to fine-tune your own smaller model for that specific task. Most people start their proof of concept with a best-in-class model, like a PaLM or GPT-4, just to get things working. But then the next step from there is you to get clear on the product and realize you don't need all of the capabilities of these advanced models."

This approach can yield significant benefits in terms of performance, cost, and even security by keeping sensitive data local. However, it does require more technical sophistication to pull off. Investing in tooling to make this process more accessible to developers was called out as a key need.

The Importance of Testing and Monitoring

For high-stakes applications in areas like law, finance, and medicine, being able to trust the outputs of an LLM system is paramount. The panelists emphasized the importance of extensive testing before deploying to production and continuously monitoring live systems for quality and reliability.

Sarmad Qadri, co-founder and CEO at LastMile AI, elaborated, "You can have a series of evaluators that you define, and those are your test cases. You can define like test data that you test this on, test indexes that you ingest and transform. And you can, if you have small enough evaluators, then you can run this thousands of times every day, cheaply enough, and get a score and a dashboard for your application."

Other best practices shared included:

  • Having subject matter experts carefully review outputs for accuracy

  • Providing clear guidelines to users on the capabilities and limitations of the system

  • Integrating real-time human feedback to continually improve performance

  • Building in 'guardrails' to detect issues like hallucinations or outdated information

An Ecosystem of Both Foundation and Targeted Models

Looking to the future, the panelists saw an important role for both massive foundation models and more specialized, targeted models. Chris compared it to the search engine landscape, "You're of course going to have ultra specialized models that focus on specific industries and specific use cases... like LexisNexis for legal searches, you know, that will still exist... But Google also exists. I don't think this is an either/or thing."

Sarmad added the prediction that the importance of model choice itself would likely fade into the background over time, "I think we'll stop thinking about models as much as we do right now because these applications that use AI will become more and more complex and sophisticated... It's like, I don't think about how a website is hosted. Like, is it on AWS? Like, is it using Lambdas? Like, who cares, right? It's an infrastructure point."

Photo Credits: Obed Obwoge

The Takeaway: Focus on Value Creation

Ultimately, the panelists all echoed the same key point - while the underlying technology is fascinating and rapidly evolving, the true measure of success is creating value for customers. As Chris put it, "You should never think about your startup as a technology experiment. Your startup is a business, so what tools you use in that business are tertiary or worse to what you're actually able to do for customers."

By focusing on deeply understanding real customer needs, rigorously testing and monitoring for quality and reliability, and building trust through transparency, startups can harness the power of LLMs to build production-ready applications that deliver real value. The infrastructure will continue to improve and evolve behind the scenes, but those customer fundamentals will always remain essential.

Photo Credits: Obed Obwoge

Cheraé Robinson

Cheraé Robinson is Head of Community at Flybridge.

Prior to Flybridge she was founder and CEO of Tastemakers Africa, an experiences marketplace connecting travelers to creatives in African cities. Flybridge was the largest investor in her seed round. Her creative approach to building a dynamic marketplace for authentic trips and tours led to collaborations with AFROPUNK, Uber, Radisson Blu, South African Airways, Soho House, and Facebook.

Named one of "50 People Changing The Way We Travel" by Conde Naste Traveler she has been featured in Forbes, Paper Magazine, Essence, Entrepreneur, Black Enterprise, Vogue, Fast Company and others. She has shared her vision for the world at TEDx Euston and the House of Beautiful Business.

Prior to taking the leap into entrepreneurship Cheraé built a career in international development. Fusing her background in science with her knack for storytelling, convening, and connecting she spent time at the CDC, CARE, CIMMYT, and Keep A Child Alive.

She is co-host of "It Just Got Real", a podcast for unconventional, creative, women entrepreneurs, sits on the board of Birthright Africa, is an angel investor, regularly writes on food and cocktail happenings in NYC, and continues to be a bridge-builder to the African continent.

Cheraé holds a B.S. in Biology from Morgan State University.

Previous
Previous

Flybridge AI Index: May 2024 Update

Next
Next

A New Chapter for New York's Startups: Introducing Next Wave NYC