LLMs in Production: Key Takeaways For Founders
As large language models (LLMs) rapidly advance, more and more companies are looking to leverage their power in real-world applications. But the path from an impressive demo to reliable production is filled with challenges around data security, performance, cost, and trust.
To unpack these issues, Flybridge recently co-hosted an event with our portfolio company Portkey at the offices of Gunderson Dettmer, bringing together expert practitioners to share their hard-earned lessons from the field. This summary brings you into the conversation - if you have questions or want to be sure you’re included in Flybridge events send a note to hello@flybridge.com.
Speakers
Panel Discussion
Rohit Agarwal, Co-founder & CEO, Portkey (backed by Flybridge & Lightspeed)
Yoni Sebag, Co-Founder & COO, Noetica AI (backed by Flybridge & Lightspeed)
Sarmad Qadri, Co-Founder & CEO, LastMile AI (backed by Gradient)
Chris Brown, AI and Startups at NVIDIA
Aaron Rubin, Partner at Gunderson Dettmer
Company Demos
Key Insights
Prioritizing Data Security and Privacy
A major theme was the heightened focus on data security and privacy when working with LLMs. As Aaron Rubin, Partner at Gunderson Dettmer noted, "There's a renewed focus on what service providers are going to do with data. This focus on, “are you going to use it to train your model or not?”, has been sort of the evergreen question over the last nine months to a year."
The idea of synthetic data and where that’s going also factors into how companies are building and training models with Chris Brown from NVIDIA adding, "That's been an area where more people have been trying to dive in to figure out how do we build stuff here where I don’t have all of this existing proprietary data, and I want to build something out of it."
Strategies shared by the panelists to address this included:
Deploying models on-premises rather than in the cloud
Contractually committing not to use customer data for model training
Pursuing security certifications like SOC 2 Type II, ISO, and NIST
Licensing high-quality third-party datasets to augment model training
Fine-tuning for Performance and Cost
While foundation models like GPT-4 are highly capable out of the box, many practitioners are finding success with fine-tuning smaller models for specific tasks. Rohit Agarwal, co-founder & CEO of Portkey shared, "The best way is to fine-tune your own smaller model for that specific task. Most people start their proof of concept with a best-in-class model, like a PaLM or GPT-4, just to get things working. But then the next step from there is you to get clear on the product and realize you don't need all of the capabilities of these advanced models."
This approach can yield significant benefits in terms of performance, cost, and even security by keeping sensitive data local. However, it does require more technical sophistication to pull off. Investing in tooling to make this process more accessible to developers was called out as a key need.
The Importance of Testing and Monitoring
For high-stakes applications in areas like law, finance, and medicine, being able to trust the outputs of an LLM system is paramount. The panelists emphasized the importance of extensive testing before deploying to production and continuously monitoring live systems for quality and reliability.
Sarmad Qadri, co-founder and CEO at LastMile AI, elaborated, "You can have a series of evaluators that you define, and those are your test cases. You can define like test data that you test this on, test indexes that you ingest and transform. And you can, if you have small enough evaluators, then you can run this thousands of times every day, cheaply enough, and get a score and a dashboard for your application."
Other best practices shared included:
Having subject matter experts carefully review outputs for accuracy
Providing clear guidelines to users on the capabilities and limitations of the system
Integrating real-time human feedback to continually improve performance
Building in 'guardrails' to detect issues like hallucinations or outdated information
An Ecosystem of Both Foundation and Targeted Models
Looking to the future, the panelists saw an important role for both massive foundation models and more specialized, targeted models. Chris compared it to the search engine landscape, "You're of course going to have ultra specialized models that focus on specific industries and specific use cases... like LexisNexis for legal searches, you know, that will still exist... But Google also exists. I don't think this is an either/or thing."
Sarmad added the prediction that the importance of model choice itself would likely fade into the background over time, "I think we'll stop thinking about models as much as we do right now because these applications that use AI will become more and more complex and sophisticated... It's like, I don't think about how a website is hosted. Like, is it on AWS? Like, is it using Lambdas? Like, who cares, right? It's an infrastructure point."
The Takeaway: Focus on Value Creation
Ultimately, the panelists all echoed the same key point - while the underlying technology is fascinating and rapidly evolving, the true measure of success is creating value for customers. As Chris put it, "You should never think about your startup as a technology experiment. Your startup is a business, so what tools you use in that business are tertiary or worse to what you're actually able to do for customers."
By focusing on deeply understanding real customer needs, rigorously testing and monitoring for quality and reliability, and building trust through transparency, startups can harness the power of LLMs to build production-ready applications that deliver real value. The infrastructure will continue to improve and evolve behind the scenes, but those customer fundamentals will always remain essential.