When to fine-tune an LLM, how one lightweight coding model is punching above it's weight class, and more!

Plus a framework for choosing between open and closed source LLMs, YOLO-NAS meets SAM, headlines, and cool GitHub repos

Aug 23, 2023

🗞️ The Sherry Code (News headlines)
📰 The Deci Digest (Research and repositories)
🤨 Do You Need to Fine-Tune an LLM, or is Prompting Enough?(Opinion)
A poll - Let me know how I did this week

What’s up, Community!

The launch of DeciCoder has been a HUGE success!

In the week since launch, it’s been downloaded 2,600+ times, gained 146 Hearts on HuggingFace, has been trending on HuggingFace Models at the fourth position, and even got shoutouts from the HuggingFace co-founders.

🤗 Thank you for all your support! 🤗

You can demo the model on HuggingFace spaces here.

Speaking of the demo…are you a Gradio expert? I was trying to get the demo to have two code output boxes to compare DeciCoder to SantaCoder with the generation streaming to the output, but I couldn’t figure out how.

If you can help, please let me know. Send me an email here.

Upcoming events this August:

Generative AI Events

• August 31st w/ The ML Makerspace: Building with LLaMA 2. You can register for that here.

YOLO-NAS Events

• August 25th w/ Harpreet: An overview of Quantization, Post-Training Quantization, Quantization Aware Training, and how to implement this using SuperGradients. You can register for that here.

Hacky Hours

All hacky hour sessions take place in the Discord community. You can join with this link if you’re not already on the Discord server.

• August 25th w/ Prakhar: How to Use DinoV2 for Downstream Tasks. This will be streamed in the Discord community. If you haven’t already done so, you can join using the button below.

Learn how to optimize your deep learning models for maximum speed and efficiency.

• August 30th w/ Ran Zilberstein, VP of Engineering at Deci. Through real-world examples and practical demonstrations, you’ll discover how to implement various techniques in your DL projects on edge devices to achieve faster processing speeds and unlock new possibilities. You can register for that here.

Join 900 other peers in Discord

🗞️ The Sherry Code: Your Weekly AI Bulletin

Shout out to Sherry for sharing her top picks for the AI news headlines you need to know about!

Sherry is an active member of the community. She’s at all the events, shares resources on Discord, and is an awesome human being.

Show some support and follow her on Instagram, Twitter, and Threads.

Introducing Project IDX by Google

IDX, a cloud-based coding workspace propelled by generative AI, is being unveiled by Google. This innovative workspace presents AI-powered aid for tasks such as code generation, completion, translation, and elucidation, ultimately amplifying the coding efficiency of developers. Additionally, it facilitates multi-browser web previews, ensuring uniform encounters across diverse devices and operating systems.

Microsoft kills Cortana in Windows as it focuses on next-gen AI

In a strategic shift, Microsoft has opted to discontinue Cortana in favour of elevating generative AI offerings such as Bing Chat. This move coincides with their integration of ChatGPT, a product of their collaboration with OpenAI, directly into the Windows 11 ecosystem.

GPTBot: the web crawler of OpenAI

Safeguarding website content from OpenAI's web crawler, GPTBot is now straightforward. Website proprietors can employ the furnished code to incorporate GPTBot into their robots.txt file, preventing it from accessing and crawling their site's information during AI training.

NVIDIA and Hugging Face to Connect Millions of Developers to Generative AI Supercomputing

A collaboration between NVIDIA and Hugging Face has materialized, allowing AI developers to harness high-performance GPUs tailored for deep learning. This union has led to the integration of DGX Cloud, extending a powerful platform for developers to leverage in their AI endeavours.

Join 900 other peers in Discord

Open-Source LLMs vs. APIs: 7 Crucial Factors to Decide Your Generative AI Strategy

Do you have plans to develop applications based on LLMs?

You have two options: Utilize a closed-source model like GPT4 via an API, or build upon an appropriate pretrained open-source LLM.

But which path should you choose?

Below is a high-level summary of the seven crucial elements you must understand and evaluate to make an informed choice.

Read a more detailed overview of the advantages and drawbacks of each and how factors such as the specifics of your application and the nature of your business influence this decision.

📰 The Deci Digest

🔤 Boston University researchers introduce Platypus, a family of fine-tuned and merged LLMs. It demonstrates impressive results in quantitative LLM metrics for various model sizes.

🇹🇷 Researchers from Turkey present Inst-Inpaint, an image inpainting model that estimates which object to be removed based on natural language input and removes it simultaneously.

💼 McKinsey debuts a Generative AI tool of its own: Lilli, a chatbot for employees that offers various functions, including providing information, insights, data, and plans. It then suggests the most suitable in-house experts for consulting projects.

How to Perform Image Segmentation using YOLO-NAS and Segment Anything (SAM)

Object detection and image segmentation are paving the way for groundbreaking advancements.

In the tutorial below, you’ll learn how to use the Segment Anything Model (SAM), which offers impressive segmentation capabilities, and the game-changing YOLO-NAS together!

Learn more about these technologies and their real-world applications. Check out the blog now for the comprehensive tutorial!

Check out the full tutorial

🤨 Do You Need to Fine-Tune an LLM, or Is Prompting Enough?

Deep learning engineers and scientists constantly grapple with the choice of whether to fine-tune an LLM or whether a mere prompt would suffice.

The essence of this dilemma isn't just a technical one but also pertains to cost, efficiency, and adaptability. I came across this tweet by Rachel Woods, which discusses when to fine-tune a model versus when prompting a model is enough.

Here are my main takeaways from Rachel’s tweet:

LLMs can effectively learn tasks and behaviours from a well-crafted prompt, which makes them impressive.
Purpose of Fine-tuning:
- It's for teaching LLMs specific tasks or behaviours.
- It's not for teaching LLMs new knowledge. For that, use Retrieval (storing data externally and pulling relevant chunks to provide context to the LLM).
When is Fine-tuning Useful?:
- Cases that involve tasks too complex to encapsulate in brief examples or prompts. Fine-tuning might be the route to consider if mastery over a task typically requires extended practice rather than just understanding the theory.
- If mastering a task would take someone weeks of hands-on experience rather than just reading about it, fine-tuning might be beneficial.
Challenges of Fine-tuning:
- It should be approached as a machine learning project, not just expecting magic results.
- It demands the rigour of a full-fledged machine learning project, encompassing dataset design, managing training and test data, battling overfitting, and other intricate considerations.
- Considerations include dataset design, training/test data, overfitting, and more as the tools for fine-tuning evolve.
Cost Factor: A practical implication. A smaller model fine-tuned for a specific task could lead to substantial savings over using a behemoth like GPT-4, especially for large-scale tasks such as customer support triaging or public data analysis.

Rachel's recommendation is investing in fine-tuning, ensure the task can't be achieved with prompts.

I’m part of an LLMOps boot camp that the LLM Wizard, 🧙‍♂️ Chris Alexiuk and others teach. I wanted his perspective on fine-tuning vs. prompting and Rachel’s tweet, so I asked him on that community Discord server.

My main takeaways from Chris were:

• To shift domains or adapt a model to a specific domain, continued pre-training or other systems like RAQA (Retrieval Augmented Question Answering) are more apt than fine-tuning.

• When using an off-the-shelf instruct-tuned model, which is typically what the average developer might access, the "just prompt it" paradigm gains relevance, especially since closed-source models generally can't be fine-tuned by the end user.

• He then elaborates on the instruct-tuned models, differentiating between closed API systems and off-the-shelf models. For those using the latter, the "just prompt it" paradigm holds water, mainly because of the limitations in fine-tuning capabilities.

Prompting is sufficient when:
- The prompt requires less than a limited amount of context window space of few-shot examples to achieve relatively good performance.
Fine-tuning becomes necessary when:
- One intends to use a RAQA system, but the few-shot examples take up so much context window space that it necessitates pieces of context to derive answers, compromising performance.
- There's a need to adapt to a task the model struggles to generalize, leading to multiple retries per interaction.
- A task demands high-confidence structured outputs.
- Incorporating a language not part of the original model's training is required.
- A more compact model, which might not perform well initially, is preferred to reduce inference costs.

While fine-tuning does come with its set of challenges and costs, in many real-world situations, it remains an indispensable tool in the arsenal of a deep learning professional.

I came across a recent edition of The Batch, where Andrew Ng offers a pragmatic step-by-step approach to this debate:

Start with Prompting: An efficient way to prototype rapidly, and it can bring an application to life within minutes.
Few-shot Prompting: When regular prompting doesn't yield the desired outcomes, incorporating a few examples can enhance results.
Fine-tuning: It's an intricate process but can offer specificity, especially when using a custom dataset.
Pretraining: The pinnacle of complexity, creating custom LLMs from scratch. It's resource-intensive but can yield specialized models for niche domains.

The proliferation of open-source or near-open-source LLMs means more choices for developers.

As Ng mentions, this offers a spectrum of complexity and cost, from simple prompting to the exhaustive process of pretraining. As tools and techniques mature, the options for developers will only expand, requiring careful consideration of the trade-offs involved.

Another interesting blog by Jessica Yao talks about the allure of fine-tuning.

Fine-tuning is often considered a go-to method for tailoring base LLMs to specific applications. Jessica Yao elucidates two primary reasons:

Desire for Additional Structure/Style: Tailoring the LLM to specific tasks or providing answers in a chosen format.
Need for Additional Source Knowledge: Equipping the base LLM with knowledge outside its original training scope, especially if such knowledge isn't widely available.

However, while intuitively appealing, the perceived benefits of fine-tuning may not always justify the associated costs, especially given the rapid advancements in the field.

Yao suggests two powerful alternatives:

Few-shot Prompting: Efficiently guiding the LLM to perform specific tasks using a handful of relevant examples. This addresses the desire for additional structure or style without the heavy investment of fine-tuning.
Retrieval-augmented Generation (RAG): Enhancing the LLM's ability to answer questions about unfamiliar topics by fetching relevant external information. This method can cater to the need for additional source knowledge.

One of the primary deterrents to fine-tuning is the associated time, effort, and potential obsolescence, especially in the face of rapidly advancing state-of-the-art models.

As Yao astutely points out, the temporary competitive advantage gained through fine-tuning might soon be eclipsed by newer, more potent models that other companies can use without additional effort.

To Fine-Tune or Not to Fine-Tune?

Fine-tuning is a potent tool, but it comes with its complexities.

Rachel Woods highlights its primary purpose: teaching LLMs specific tasks or behaviours but not necessarily injecting new knowledge. However, the line blurs for tasks that may require weeks for a human to grasp; here, fine-tuning could play a pivotal role.

Jessica Yao urges caution.

The rapid evolution in LLMs means today's competitive edge might be tomorrow's standard feature. Is the resource investment in fine-tuning justified then? Not always, according to her.

Andrew Ng shares similar reservations, indicating that transitioning to fine-tuning from prompting, especially with proprietary models, introduces added complexity.

Yet, scenarios do exist where fine-tuning holds merit.

As Jessica Yao suggests, stringent accuracy requirements or the need for fast edge inference can justify the investment.

Like Chris Alekiuk, I look forward to future LLMs integrating the best architecture, data, and fine-tuning methods.

Conclusion

While the allure of fine-tuning is undeniable, industry leaders emphasize the essence of starting simple, using prompts effectively, and judiciously deciding when to invest deeper.

While Jessica Yao emphasizes the value of lighter-touch approaches, Chris Alexiuk and Rachel Woods offer nuanced takes. Alexiuk underscores the structural significance of fine-tuning, while Woods highlights the economic and practical implications of the choice. With his pragmatic methodology, Andrew Ng encourages developers to start simple and escalate in complexity only if necessary.

With LLMs rapidly evolving, today's strategies might need reevaluation tomorrow, making adaptability and continuous learning the true constants in this journey.

That’s it for this week!

Let me know how I’m doing.

Cheers,

Harpreet