LOCATION ADDRESS (Hybrid in person and zoom)
In-person at Hacker Dojo, 855 Maude Ave, Mountain View, CA 94043
***For faster entry*** at the Dojo, please [read the Dojo policies](https://tinyurl.com/9cn8sevt) and when you sign up, state “I accept the [Hacker Dojo policies](https://tinyurl.com/9cn8sevt)”
Online on Zoom
[https://acm-org.zoom.us/j/94167775732?pwd=RFZPQVA5RFpTTnE3RGlYU2VYejNtdz09](https://acm-org.zoom.us/j/94167775732?pwd=RFZPQVA5RFpTTnE3RGlYU2VYejNtdz09)
On YouTube:
[https://youtube.com/live/LFx8K7oZkk8?feature=share](https://youtube.com/live/LFx8K7oZkk8?feature=share)
AGENDA
6:30 Door opens, food and networking (we invite honor system contributions)
7:00 SFBayACM upcoming events, introduce the speaker
7:10 speaker presentation starts
8:15 – 8:30 finish, depending on Q&A
TALK DESCRIPTION
RAG using Milvus, HuggingFace, LangChain, Ragas, with or without OpenAI
Abstract:
You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general Internet data used to train most foundation models. Join me for a Python demo tutorial on building a customizable RAG (Retrieval Augmented Generation) stack using OSS Milvus vector database, LangChain, Ragas, HuggingFace, and optional Zilliz cloud and OpenAI.
Learn best practices and advanced techniques to optimize GenAI workflows with your own data.
What you’ll learn:
* Using Python, learn how to build a customizable open source RAG (Retrieval Augmented Generation) chatbot with Milvus vector database, LangChain, Ragas, and HuggingFace models, and optional Zilliz cloud and OpenAI.
* Best practices around embedding text data (“embedding” in AI is like “featurization” in ML).
* Best practices around vector indexing and search.
* Best practices around RAG evaluation with Ragas.
Tutorial notebook link will be linked here: [https://github.com/milvus-io/bootcamp/tree/master/bootcamp](https://github.com/milvus-io/bootcamp/tree/master/bootcamp)
Tutorial instructions like this but more focused on running locally: [https://docs.google.com/document/d/1yetuGEkYqh_1rAYEBXFAnwsFClMAIQFx1erLHHKXTLg](https://docs.google.com/document/d/1yetuGEkYqh_1rAYEBXFAnwsFClMAIQFx1erLHHKXTLg)
Slides like these: [https://docs.google.com/presentation/d/1hpiaiVMHm4oQr5P86NhcrL0qXwBIdWhHEZOlWKySHyM](https://docs.google.com/presentation/d/1hpiaiVMHm4oQr5P86NhcrL0qXwBIdWhHEZOlWKySHyM)
SPEAKER BIO:
6+ years building AI and ML systems with math and coding. My mission is to help developers and customers use those tools (with fewer heartaches than I had teaching myself) to organize and search unstructured data, such as images, videos, texts, and audios, using LLM and multi-modal apps. I enjoy learning new technologies and tools and solving challenging problems with math and coding.
As a Developer Advocate, I use my skills in Python, HuggingFace, PyTorch, Spark, RLlib, Ray distributed computing, and vector databases to create and share engaging and informative content, such as tutorials, demos, blogs, and talks. I also manage the Bay Area Unstructured Data meetup group, where I organize events and foster a community of enthusiasts and experts in the field.
Outside of work, I enjoy hiking and bird watching. In my background photo: Australian bustard, spotted near Cairns, Australia.