In today's rapidly growing world of conversational AI, developers often seek ways to leverage multiple models seamlessly to diversify outputs and enhance user experience. One such scenario involves using different Local Language Models (LLMs) to serve different purposes or to offer a variety of responses. In this article, we'll explore a method to set up and switch between multiple local LLMs, particularly Zephyr and Mistral 7B, using the Chainlit and Langchain libraries.
Before we delve into the actual code, let's understand our tools:
Let's get started!
python
import os
from langchain.llms import CTransformers
from langchain import PromptTemplate, LLMChain
import chainlit as cl
from configs.directory import LOCAL_LLMS
We begin by importing necessary modules and packages.
python
zllm = LOCAL_LLMS['zephyr-7b-beta']
mllm = LOCAL_LLMS['mistral-7b-openorca']
llms = [zllm, mllm]
Here, we extract the paths to our local Zephyr and Mistral models.
python
config = {
"max_new_tokens": 2048,
"repetition_penalty": 1.1,
"temperature": 0.5,
"top_k": 50,
"top_p": 0.9,
"stream": True,
"threads": int(os.cpu_count() / 2)
}
We set up a configuration dictionary to customize our models' behavior, such as token generation limits, sampling temperature, and threading.
python
zllm_init = CTransformers(model=str(zllm), model_type='mistral', **config)
mllm_init = CTransformers(model=str(mllm), model_type='mistral', **config)
This is where we initialize our LLMs using the CTransformers
class from Langchain.
python
template = """
Question:
{question}
"""
query = "What is the meaning of life?"
We're setting up a simple template that structures our questions.
python
@cl.on_chat_start
def main():
prompt = PromptTemplate(template=template, input_variables=['question'])
zllm_chain = LLMChain(llm=zllm_init, prompt=prompt, verbose=True)
mllm_chain = LLMChain(llm=mllm_init, prompt=prompt, verbose=True)
cl.user_session.set('zllm_chain', zllm_chain)
cl.user_session.set('mllm_chain', mllm_chain)
cl.user_session.set('llm_chain', zllm_chain)
When the chat session starts, we create LLM chains using our initialized models and the prompt template. This allows us to easily process incoming messages and get the model's responses.
python
@cl.on_message
async def on_message(message):
llm_chain = cl.user_session.get('llm_chain')
if message.content == 'zllm':
await cl.Message(content="Set to Zephyr").send()
cl.user_session.set('llm_chain', cl.user_session.get('zllm_chain'))
elif message.content == 'mllm':
await cl.Message(content="Set to Mistral").send()
cl.user_session.set('llm_chain', cl.user_session.get('mllm_chain'))
else:
res = await llm_chain.acall(message.content, callbacks=[cl.AsyncLangchainCallbackHandler( )])
await cl.Message(content=res['text']).send()
This is the core of our chatbot. When a message is received:
By employing Chainlit and Langchain, we've built a simple yet effective chatbot that can switch between two powerful LLMs on the fly. This approach showcases the flexibility and potential of working with local LLMs to cater to diverse conversational needs.
Developers can expand on this foundation by adding more models, customizing prompts, or introducing advanced features like context retention and multi-turn conversations. The sky's the limit!
Created 2023-11-09T14:59:10-08:00 · Edit