How to Fine-Tune Llama 2 Using LoRA, RAG with Llama 2, and Learn about CLIP and ImageBind
Plus, September calendar of events, the Sherry Code, and Deci Digest
What’s up, Community!
Huge shout out to Jayeeta Putatunda for the amazing AMA session today. The title of the talk was The Role of Data Scientists in LLMOps and LLM Lifecycles, but as you can see from audience questions, we covered A LOT more ground than that.
You can catch the replay below if you missed it. (I wasn’t supposed to stream it to my personal YouTube, but I accidentally did!)
🗓️ We’ve got more events like this all month. Here’s a quick rundown of the September calendar of events:
September 13th - Builders of YOLO-NAS w/ Ritesh Kanjee. Ritesh Kanjee of Augmented Startups joins us for an end-to-end project where you'll learn how to build an end-to-end object detection system using YOLO-NAS! You can register for this event here.
September 15th - Hacky Hours in Discord with Matt MacFarlane
September 19th - Hacky Hours in Discord with Prakhar Thakur: Generating Synthetic Data for Computer Vision Tasks
September 20th - Harnessing AI Agents, A Deep Dive with Abi Aryan. Abi is an expert in Generative Agents. She’ll join us for an AMA session. You can register for this event here.
September 22nd - Hacky Hours in Discord with Abby Morgan: SAM + Stable Diffusion for Text-to-Image Inpainting
September 27th - From Prototypes to Products: Leveraging LLMs for Applications with Suhas Pai. In this AMA session, Suhas will answer questions about language models and exploit their strengths to build valuable products. You can register for this here.
September 29th - Knowledge Distillation using SuperGradients with Harpreet. In this session, you'll gain an understanding of Knowledge Distillation, a pivotal technique for improving model efficiency. You can register for this here.
🧐 What’s in this edition?
🗞️ The Sherry Code (News headlines)
📰 The Deci Digest (Research and repositories)
🗞️ The Sherry Code: Your Weekly AI Bulletin
Shout out to Sherry for sharing her top picks for the AI news headlines you need to know about!
Sherry is an active member of the community. She’s at all the events, shares resources on Discord, and is an awesome human being.
Show some support and follow her on Instagram, Twitter, and Threads.
Introducing Code Llama, a state-of-the-art large language model for coding: Code Llama is an exceptional coding tool that outperforms other models in benchmarks. It offers three versions, one specialized for Python and another fine-tuned for understanding natural language instructions. With the ability to generate code based on text prompts, Code Llama is a powerful language model designed specifically for coding tasks.
Bringing the world closer together with a foundational multimodal model for speech translation: Meta has launched SeamlessM4T, a comprehensive multilingual model for translating and transcribing speech and text, and a foundational multimodal model for speech translation. The model is open-source and based on the largest open multi-lingual dataset.
GPT-3.5 Turbo fine-tuning and API updates: OpenAI's GPT-3.5 Turbo now supports fine-tuning for developers to customize its performance, which rivals GPT-4 in specific tasks and emphasizes safety, reducing toxicity and bias.
Introducing ChatGPT Enterprise: OpenAI's ChatGPT Enterprise delivers enhanced security, data analysis, and encryption with GPT-4 access. It's SOC2 compliant, features an admin console, SSO, and domain verification, and ensures data privacy.
Ideogram launches AI image generator with impressive typography: Ideogram, a new AI image generation startup, offers unique text generation within images. Founded by ex-Google Brain researchers, it raised $16.5 million and may attract graphic designers with its typography feature.
Teaching with AI: Open AI has released a teaching guide on AI, providing insights into ChatGPT's functionality, limitations, the effectiveness of AI detectors, and addressing bias.
Identifying AI-generated images with SynthID: DeepMind, with Google Cloud, launched SynthID to watermark and identify AI-generated images. The tool embeds invisible watermarks, distinguishing real from AI-created content and preventing misinformation.
Multimodal Models: Exploring Training Techniques and Innovations through CLIP and ImageBind
The human brain integrates data from various sources to create a coherent narrative that shapes our perceptions and actions. Multimodal deep learning models aim to imitate this process on a larger scale.
This article explores these models, their applications, training techniques, and two prominent models, CLIP by OpenAI and ImageBind by Meta Research, and discusses the practical aspects of implementing these models.
How to Fine-tune Llama 2 with LoRA for Question Answering: A Guide for Practitioners
Meta's Llama 2 is a language model that surpasses its predecessor, Llama 1, with a larger scale of up to 70 billion parameters.
An evolution from its predecessor, Llama 1, Meta’s Llama 2 is an extensive language model with variants scaling up to 70 billion parameters. It boasts an augmented context length and the innovative introduction of Grouped Query Attention (GQA), which enhances the inference scalability of the model.
This guide will walk you through fine-tuning Llama 2 with LoRA for Question Answering tasks.
📰 The Deci Digest
🧠 Virginia Tech and Microsoft propose the "Algorithm of Thoughts," an innovative approach that guides LLMs along algorithmic reasoning routes to usher in contextual learning.
📊 Meta has released FACET, an AI benchmark designed to evaluate the "fairness" of computer vision models that classify and detect things in photos and videos, including people.
🍔 Samsung Food, an AI-based app with over 160,000 recipes, launches in 104 countries and eight languages.
👁️ Alibaba Group researchers have introduced Qwen-VL, a powerful vision language model that can understand text and images. It accepts image, text, and bounding box inputs and outputs text and bounding box as well.
😎 Researchers have developed SMPLitex, a technique to estimate and manipulate the complete 3D representation of a person based on only one image.
💻 Open Interpreter enables LLMs to run local code in Python, JavaScript, Shell, and more, using a ChatGPT-like interface within the terminal by running $ interpreter.
🦅 The Technology Innovation Institute in the United Arab Emirates has developed the world's most powerful open LLM, Falcon 180B, claiming the top spot on Hugging Face's leaderboard for open-access LLMs.
📝 Zoom has launched AI Companion, which assists users in composing chat responses, saves time and allows users to summarize discussions for those who join late.
YOLO-NAS is Still King in Object Detection
TL;DR: What’s New in the YOLO-NAS Architecture?
The use of QSP and QCI blocks combine re-parameterization and 8-bit quantization advantages. These blocks rely on an approach suggested by Chu et al. Blocks allow for minimal accuracy loss during post-training quantization.
Deci’s proprietary NAS technology, AutoNAC, was used to determine optimal sizes and structures of stages, including block type, number of blocks, and number of channels in each stage.
A hybrid quantization method that selectively quantizes certain parts of a model, reducing information loss and balancing latency and accuracy. Standard quantization affects all model layers, often leading to significant accuracy loss. Our hybrid method optimizes quantization to maintain accuracy by only quantizing certain layers while leaving others untouched. Our layer selection algorithm considers each layer’s impact on accuracy and latency and the effects of switching between 8-bit and 16-bit quantization on overall latency.
A pre-training regimen includes automatically labelled data, self-distillation, and large datasets.
The YOLO-NAS architecture is available under an open-source license. Its’ pre-trained weights are available for research use (non-commercial) on SuperGradients, Deci’s PyTorch-based, open-source, computer vision training library.
Check out the starter notebook here.
3 AI Research Content Creators Reveal Their Work Process
With thousands of AI publications that come out yearly monthly, it can be difficult for data scientists, machine learning engineers, and AI practitioners to keep up.
How do you stay current on trending papers and state-of-the-art in your subfields of interest? Do you have enough time to read it all? You can check proceedings of conferences for interesting titles, use an arXiv feed/recommender system, talk to the community for recommendations, or set up Google Scholar notifications.
Another way to stay on top of the latest AI papers is by following relevant people on social media, which is what this blog is all about. Our DevRel Manager, Harpreet Sahota, contacted three content creators who read, synthesize, and share key research in various AI subfields.
Shubham Saboo is an AI evangelist, the author of two books, and a data scientist. He is currently the Head of DevRel at Tenstorrent Inc.
Sebastian Raschka is a deep learning and AI researcher, programmer, author, and educator. Making AI and DL more accessible, he is the Lead AI Educator at Lightning AI.
Cameron R. Wolfe, Ph.D., is the Director of AI at Rebuy. He is a researcher interested in DL and passionate about explaining scientific concepts to others.
Continue reading to see their insights into their work and how it impacts the bigger AI community.
In Case You Missed It
Last week, the guys from AI Makerspace came by the community to teach the community how to do Retrieval Augmented Generation with Llama 2.
You can catch the recording here 👇🏽
That’s it for this week!
Let me know how I’m doing.
Cheers,