Exclusive interview with a researcher, and how to fine-tune a base LLM for instruction

Plus upcoming community events, news headline, and cutting edge research

Deep Learning Daily Community

Nov 18, 2023

What’s up, Community!

🧐 What’s in this edition?

🗞️ News headlines
📰 The Deci Digest (Research and repositories)
How to fine-tune a base LLM with QLoRA
🤨 (Opinion)
A poll - Let me know how I did this week

Join 700+ Peers in the DLD Discord

🗓️ Upcoming Community Events

Nov 20 - How to Fine-tune a Base LLM for Retrieval Augmented Generation (RAG): In a webinar, Deci and Ai Bloks will demonstrate the integration of Deci LM-6B LLM - fine-tuned for RAG - into a RAG workflow using Ai Bloks' open-source library, llmware. The webinar will focus on use cases in financial services, legal and compliance.

The webinar will provide hands-on experience with code samples, highlight every component of this state-of-the-art open-source RAG system, and show how to customize it for your workflows.

Nov 20 - How to Chat with Images Data Using New GPT-4 Vision API. Discover how GPT-4's Vision API transforms image data analysis with AI. Join a live demo and code-sharing session covering integration, strategies, and best practices.

🤖 Nov 22 - Agents: LangChain vs. OpenAI Assistants. Learn to develop complex LLM apps by our very own community member Chris Alexiuk at an event! LangChain develops reasoning apps using Chain-of-Thought for LLM. Combined with ReAct, it creates complex LLM apps. OpenAI's Assistants API simplifies creating agent-like apps. Ideal for LLM Ops practitioners and builders wanting to develop agent-like systems.

🎥 Community Newsletter Exclusive: Darwin Bautista on Scene Test Recognition with Permuted Autoregressie Sequence Models

The following interview is a conversation I had with Darwin Bautista at ECCV (European Conference on Computer Vision) about his paper "Scene Text Recognition with Permuted Autoregressive Sequence Models."

Bautista's work addresses scene text recognition (STR), which is challenging due to the text's variable font styles, orientations, shapes, illumination, and occlusions in natural scenes.

His paper presents a novel approach called PARSeq (Permuted Autoregressive Sequence models). This method is notable for its unified structure, which combines context-free non-autoregressive (NAR) and context-aware autoregressive (AR) inference and iterative refinement using bidirectional context.

This contrasts with previous methods that used separate language and fusion models.

Here are some highlights from our conversation and his research.

🏋🏼‍♀️ Challenges in STR: He highlighted the challenges in STR, such as recognizing partially obscured or incomplete text. Incorporating language context or modeling helps understand such text, similar to human perception.

🙆🏽 Approach to STR: His approach combines vision and language models. Initially, a vision model makes predictions which a language model then refines. He discovered that using a mixture of autoregressive and non-autoregressive techniques from NLP (Natural Language Processing) improves STR.

🔍 Findings and Innovations: Bautista's research showed that training models on real data containing misoriented or vertical text improved performance significantly compared to models trained only on synthetic data. He proposed establishing more challenging benchmarks for STR as current methods perform well on standard benchmarks.

🤖 Transformer-Based Model: Learn about the Transformer architecture utilized in PARSeq, showcasing a 12-layer Vision Transformer (ViT) encoder and a single-layer decoder, offering a new perspective in deep learning model design.

📈 State-of-the-Art Results: Learn how PARSeq achieved remarkable accuracy on STR benchmarks, excelling in recognizing arbitrarily oriented text, a key challenge in real-world applications.

🌐 Practical Applications: Find out how Bautista's research paves the way for practical applications, particularly in augmented reality and assistive technologies, highlighting the real-world impact of this research.

💡 Future Directions and Accessibility: Get insights into the potential future developments and the availability of this research for further exploration, with code and data accessible to the public.

🗞️ Your Weekly AI Bulletin

🧠 Have you ever imagined a workspace where artificial intelligence streamlines every search and simplifies your workflow? Dropbox and Nvidia are joining forces to turn this vision into reality. In a recent announcement, Dropbox revealed its collaboration with Nvidia to enhance its platform's productivity using advanced AI tools. This partnership is set to revolutionize how Dropbox customers interact with their cloud content by integrating Nvidia's cutting-edge AI technology.

🤖 Ever wondered if the cold precision of AI can be warmed up with a touch of human empathy in customer service? The founders of Siena AI are betting on it, redefining the customer service game for merchants. In a recent TechCrunch article, we dive into how Siena AI, co-founded by Andrei Negrau and Lisa Popovici, is tackling the notorious reputation of chatbots in customer service. With their background in e-commerce and software development for Shopify merchants, they've crafted an AI-powered solution that promises a machine's efficiency but with a human's understanding and empathy. This innovative approach aims to streamline customer service interactions without sacrificing the personal touch that brands have worked hard to cultivate.

🌱 Have you ever wondered how urban farming could revolutionize our cities and our plates? A recent article delves into the burgeoning world of high-tech urban agriculture, painting a picture of a future where our food grows up, quite literally, around us. The article explores the innovative approaches to urban farming, where technology and agriculture meet to create sustainable food systems within city landscapes. It highlights how these green initiatives are reshaping unused urban spaces and bringing fresh produce closer to consumers, reducing food miles and potentially lowering carbon footprints.

👁️ Could the eyes be a window to the heart's health? Ehsan Vaghefi, the CEO of Toku, seems to think so, and his personal story is as compelling as the technology his company is developing. Dive into how a childhood surrounded by the visually impaired led Vaghefi to innovate in the field of ocular imaging, aiming to revolutionize how we detect cardiovascular diseases. In a heartfelt narrative, we learn that Vaghefi's inspiration stems from his father's blindness due to congenital glaucoma. Rather than becoming a clinician, Vaghefi chose the technology path, founding Toku to leverage AI in diagnosing health conditions through the eye. Toku's flagship product, CLAiR, is an AI-powered retina scan that can non-invasively assess cardiovascular risks within seconds, potentially integrating into routine eye exams.

🤖 What does a change in leadership mean for a pioneering AI company like OpenAI? Today's news might give us a glimpse into the future of AI innovation. OpenAI has significantly shifted at the top in an unexpected turn of events. Sam Altman, the CEO and board member, has been dismissed, and Mira Murati stepped in as the interim CEO. Murati, with a rich background in engineering and product development, has been a critical player in the tech industry, with stints at Tesla and Leap Motion before joining OpenAI. Her promotion is critical for the company, which is known for its groundbreaking AI tools like ChatGPT and DALL-E.

Join 700+ Peers in the DLD Discord

🧘🏽YOLO-NAS Pose Has Arrived!

⭐️ Go and star SuperGradients, the official home of YOLO-NAS-Pose on GitHub

📓 Go and try it yourself with the quickstart notebook

🧑🏽‍💻 Train the model on custom data with the fine-tuning notebook

🤗 Try the demo on Hugging Face Spaces

Join 700+ Peers in the DLD Discord

📰 The Deci Digest

🖼️ Meta reveals various advancements regarding Emu, its first foundation model for image generation. New tools enable more control over image editing via text instructions and a new method for text-to-video generation.

🎼 Also, from Meta AI, researchers present a model that can produce 3D spatial audio for full human bodies. The system takes audio data from headset microphones and body positioning, generating a 3D audio environment around the individual. Data and code will be available by Dec 10th, 2023.

📔 An Adobe Research and Australian National University team introduces the Large Reconstruction Model (LRM). This pioneering model predicts the 3D structure of an object using only one image within just 5 seconds.

1×

0:00

-1:15

🔉 Qwen-Audio, the multimodal iteration of Alibaba Cloud's Qwen large model series, accepts various audio formats such as human speech, natural sounds, music, and text inputs, generating text as its output.

👔 Google’s AI-powered Search Generative Experience (SGE) is slowly rolling out to more countries. To help users find unique products, its shopping features use AI to generate gift ideas or fashion items that users can explore and purchase.

👨🏻‍🏫 How to Fine-Tune a Base LLM Using QLoRA

Are you ready to code? Because you’ll be doing a lot of that in this hands-on tutorial! Here’s what you’ll learn:

How the QLoRA magic works
The innovations that QLoRA introduced
How to set up your config files for QLoRA and peft
Hyperparameters for QLoRA
How to prepare the model for training
Training the model using SFTTrainer
Overcoming deployment challenges

Read the full blog here

👩🏾‍💻 The Webinar Version: Join me on December 5th

Read the blog and want to go deeper? Join me for a webinar on December 5th, where I’ll fine-tune for a different use case, talk about evaluating the fine-tuned model, go into more depth, and answer your questions.

Specialized Fine-Tuning: Adapt LLMs for niche tasks using labelled data.
Introduction to Instruction Tuning: Enhance LLM capabilities and controllability.
Dataset Preparation: Format datasets for effective instruction tuning.
BitsAndBytes & Model Quantization: Optimize memory and speed with the BitsAndBytes library.
PEFT & LoRA: Understand the benefits of the PEFT library from HuggingFace and the role of LoRA in fine-tuning.
TRL Library Overview: Delve into the TRL (Transformers Reinforcement Learning) library's functionalities.
SFTTrainer Explained: Navigate the SFTTrainer class by TRL for efficient supervised fine-tuning.

That’s it for this week!

Let me know how I’m doing.

Cheers,

Harpreet

Exclusive interview with a researcher, and how to fine-tune a base LLM for instruction

Plus upcoming community events, news headline, and cutting edge research

🧐 What’s in this edition?

🗓️ Upcoming Community Events

🎥 Community Newsletter Exclusive: Darwin Bautista on Scene Test Recognition with Permuted Autoregressie Sequence Models

🗞️ Your Weekly AI Bulletin

🧘🏽YOLO-NAS Pose Has Arrived!

📰 The Deci Digest

👨🏻‍🏫 How to Fine-Tune a Base LLM Using QLoRA

👩🏾‍💻 The Webinar Version: Join me on December 5th

That’s it for this week!

Discussion about this post