LOCATION ADDRESS (Hybrid, in person or by zoom, you choose)
Hacker Dojo
855 Maude Ave
Mountain View, CA 94043
(for faster sign in, read [Hacker Dojo policies](https://tinyurl.com/9cn8sevt). When you sign up, state “I accept the Hacker Dojo policies”.)
If you want to join remotely, you can submit questions via Zoom QnA. The zoom link:
https://acm-org.zoom.us/j/93784936096?pwd=QVZETjVBN0ZuZnNsZ0F2VFdTL3FkUT09
AGENDA
6:30 Door opens, Food
7:00 SFBayACM upcoming events, introduce the speaker
7:10 presentation starts
8:15-8:30 finish, depending on Q&A
ABSTRACT
Due to the ongoing expansion of large language models (LLMs), leading to performance degradation, heightened memory requirements, and increased computational demands, there is a growing urgency for efficient quantization to compress LLMs into a more compact form. (https://stackoverflow.blog/2023/08/23/fitting-ai-models-in-your-pocket-with-quantization/). Additionally, optimization across platforms is pivotal for enhancing the accessibility of LLMs. BigDL-LLM (https://github.com/intel-analytics/BigDL) is designed to make efficient LLM development more accessible all Intel platform users, spanning from CPUs to GPUs, from clients to the cloud.
BigDL-LLM is an open source library designed to run large language models (LLMs) using low-bit optimizations (FP4/INT4/NF4/FP8/INT8) on Intel XPU , for any PyTorch model with very low latency and small memory footprint. BigDL-LLM incorporates a variety of low-bit technologies including llama.cpp, gptq, bitsandbytes, qlora, and more. With bigdl-llm, users can build and run LLM applications for both inference and fine-tuning, using standard PyTorch APIs (e.g., HuggingFace Transformers and LangChain) on Intel platforms. Meanwhile, a wide range of models (such as LLaMA/LLaM2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, Dolly/Dolly-v2, Bloom, StarCoder, Whisper, InternLM, Baichuan, QWen, MOSS, etc.) have already been verified and optimized on bigdl-llm.
The presentation will walk the audience the process of optimizing a Llama 2 model unitizing the BigDL-LLM library, and offers a practical session on deploying a chatbot through Llama2 on an Intel laptop. Subsequently, A detailed walkthrough of the material will be covered as part of a broader workshop on LLM Agents. We invite everyone to join us in exploring this exciting journey with the Intel BigDL-LLM.
SPEAKER BIOs
Jiao (Jennie) Wang is a AI Framework Engineer on the Machine Learning Platform team at Intel working in the area of AI and big data analytics. She is key contributor in developing and optimizing distributed ML/DL framework and provide customer support for end-to-end AI solutions.
Guoqiong Song: An AI Frameworks Engineer at Intel, with a focus on building end-to-end AI applications within the AI Software Engineering department. She has a PhD degree in Atmospheric and Oceanic Sciences from UCLA, specializing in quantitative modeling and data analysis. Previously, Guoqiong worked at Verizon as a data scientist.
https://www.linkedin.com/in/guoqiong-song-903aa759/
Jiao (Jennie) Wang is a AI Framework Engineer on the Machine Learning Platform team at Intel working in the area of AI and big data analytics. She is key contributor in developing and optimizing distributed ML/DL framework and provide customer support for end-to-end AI solutions.