Unleashing Dual Power: Switching Seamlessly Between Zephyr & Mistral 7B Models in Multiple LLMs

In today's rapidly growing world of conversational AI, developers often seek ways to leverage multiple models seamlessly to diversify outputs and enhance user experience. One such scenario involves using different Local Language Models (LLMs) to serve different purposes or to offer a variety of responses. In this article, we'll explore a method to set up and switch between multiple local LLMs, particularly Zephyr and Mistral 7B, using the Chainlit and Langchain libraries.

Setting the Stage

Before we delve into the actual code, let's understand our tools:

Chainlit: This library provides a framework for creating chatbots using local LLMs.
Langchain: A toolkit designed to interact with LLMs, offering utilities like prompt templates and chaining of LLMs.

Let's get started!

Initial Configurations

python import os from langchain.llms import CTransformers from langchain import PromptTemplate, LLMChain import chainlit as cl from configs.directory import LOCAL_LLMS

We begin by importing necessary modules and packages.

Loading LLMs

python zllm = LOCAL_LLMS['zephyr-7b-beta'] mllm = LOCAL_LLMS['mistral-7b-openorca'] llms = [zllm, mllm]

Here, we extract the paths to our local Zephyr and Mistral models.

Configuring the Transformers

python config = { "max_new_tokens": 2048, "repetition_penalty": 1.1, "temperature": 0.5, "top_k": 50, "top_p": 0.9, "stream": True, "threads": int(os.cpu_count() / 2) }

We set up a configuration dictionary to customize our models' behavior, such as token generation limits, sampling temperature, and threading.

Initializing LLMs

python zllm_init = CTransformers(model=str(zllm), model_type='mistral', **config) mllm_init = CTransformers(model=str(mllm), model_type='mistral', **config)

This is where we initialize our LLMs using the CTransformers class from Langchain.

Setting Up Prompts

python template = """ Question: {question} """ query = "What is the meaning of life?"

We're setting up a simple template that structures our questions.

Handling Chat Sessions

python @cl.on_chat_start def main(): prompt = PromptTemplate(template=template, input_variables=['question']) zllm_chain = LLMChain(llm=zllm_init, prompt=prompt, verbose=True) mllm_chain = LLMChain(llm=mllm_init, prompt=prompt, verbose=True) cl.user_session.set('zllm_chain', zllm_chain) cl.user_session.set('mllm_chain', mllm_chain) cl.user_session.set('llm_chain', zllm_chain)

When the chat session starts, we create LLM chains using our initialized models and the prompt template. This allows us to easily process incoming messages and get the model's responses.

Processing Messages

python @cl.on_message async def on_message(message): llm_chain = cl.user_session.get('llm_chain') if message.content == 'zllm': await cl.Message(content="Set to Zephyr").send() cl.user_session.set('llm_chain', cl.user_session.get('zllm_chain')) elif message.content == 'mllm': await cl.Message(content="Set to Mistral").send() cl.user_session.set('llm_chain', cl.user_session.get('mllm_chain')) else: res = await llm_chain.acall(message.content, callbacks=[cl.AsyncLangchainCallbackHandler( )]) await cl.Message(content=res['text']).send()

This is the core of our chatbot. When a message is received:

If it's "zllm", we switch to the Zephyr model.
If it's "mllm", we switch to the Mistral model.
For other messages, we process them through the currently active LLM and return the response.

Conclusion

By employing Chainlit and Langchain, we've built a simple yet effective chatbot that can switch between two powerful LLMs on the fly. This approach showcases the flexibility and potential of working with local LLMs to cater to diverse conversational needs.

Developers can expand on this foundation by adding more models, customizing prompts, or introducing advanced features like context retention and multi-turn conversations. The sky's the limit!

Created 2023-11-09T14:59:10-08:00 · Edit