Building a Chat User Interface with Streamlit

Streamlit Chat User Interface

Madhan Kumar
7 min readOct 8, 2024

Introduction :

In recent years, Streamlit has gained popularity among developers for building interactive web applications, especially for data science and machine learning projects. This open-source library allows developers to create stunning web applications with minimal code and no frontend experience. One exciting use case for Streamlit is building a chat user interface (UI) that can interact with users and respond to queries.

In this Article, we will explore what Streamlit is, its key features, and how to create a chat user interface using the provided code snippet.

What is Streamlit?

Streamlit is a Python library that simplifies the process of building web applications. Its main features include:

  • Ease of Use: Streamlit allows developers to create web apps quickly using simple Python scripts.
  • Real-time Interactivity: The framework enables real-time interaction with the application, allowing users to see updates without needing to refresh the page.
  • Integrated Components: Streamlit provides a range of built-in components (like sliders, buttons, and charts) that enhance user experience.
  • Data Visualization: Built-in support for popular visualization libraries, such as Matplotlib, Plotly, and Altair, makes it easy to present data visually.

Setting Up the Environment :

To get started with Streamlit, ensure you have Python installed. You can then install Streamlit and other required libraries using pip:

pip install streamlit llama-index-llms-ollama llama_index.embeddings.huggingface

Building a Chat User Interface

In this section, we will go through the process of building a chat user interface using Streamlit. Below is a comprehensive explanation of the full code provided, detailing each section and its functionality in the context of building a chat user interface using Streamlit.

import os
import base64
import gc
import tempfile
import uuid

from IPython.display import display
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import streamlit as st

# Initialize session state for unique session identification and file caching
if "session_id" not in st.session_state:
st.session_state.session_id = uuid.uuid4()
st.session_state.document_cache = {}

session_id = st.session_state.session_id

# Load language model with a caching mechanism
@st.cache_resource
def initialize_llm():
return Ollama(model="llama3.2:3b-instruct-q8_0", request_timeout=120.0)

# Reset the chat state and clear garbage
def reset_chat_state():
st.session_state.chat_history = []
gc.collect()

# Display PDF in a sidebar with embedded preview
def render_pdf(file):
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")
pdf_html = f"""
<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="100%" type="application/pdf"
style="height:100vh;">
</iframe>
"""
st.markdown(pdf_html, unsafe_allow_html=True)

# Sidebar for document upload and processing
with st.sidebar:
st.header("Upload a Document")
uploaded_file = st.file_uploader("Select a PDF file", type="pdf")

if uploaded_file:
try:
# Temporarily store uploaded file for processing
with tempfile.TemporaryDirectory() as temp_dir:
file_path = os.path.join(temp_dir, uploaded_file.name)

# Save uploaded file
with open(file_path, "wb") as f:
f.write(uploaded_file.getvalue())

# Create a unique key for the file to manage state
file_key = f"{session_id}-{uploaded_file.name}"
st.write("Processing and indexing document...")

# Process and cache the document if it hasn't been processed already
if file_key not in st.session_state.document_cache:
if os.path.exists(temp_dir):
loader = SimpleDirectoryReader(
input_dir=temp_dir,
required_exts=[".pdf"],
recursive=True
)
else:
st.error('File not found. Please check and try again.')
st.stop()

# Load document data
documents = loader.load_data()

# Initialize LLM and embedding model
llm = initialize_llm()
embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)

# Create an index over the loaded data
Settings.embed_model = embedding_model
document_index = VectorStoreIndex.from_documents(documents, show_progress=True)

# Create a query engine using the LLM
Settings.llm = llm
query_engine = document_index.as_query_engine(streaming=True)

# Define custom prompt template for QA
qa_prompt_template_str = (
"Context information is provided below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Based on the context, please respond concisely. "
"If the answer is unknown, respond with 'I don't know'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_template = PromptTemplate(qa_prompt_template_str)
query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_template}
)

# Cache the query engine for future use
st.session_state.document_cache[file_key] = query_engine
else:
# Load query engine from cache
query_engine = st.session_state.document_cache[file_key]

# Confirm document is ready for interaction and display the PDF
st.success("Document successfully indexed. Ready for queries!")
render_pdf(uploaded_file)

except Exception as e:
st.error(f"Error processing document: {e}")
st.stop()

# Main chat interface layout
col1, col2 = st.columns([6, 1])

with col1:
st.header("Document Chat Interface")

with col2:
st.button("Clear Chat", on_click=reset_chat_state)

# Initialize chat history
if "chat_history" not in st.session_state:
reset_chat_state()

# Display chat history
for message in st.session_state.chat_history:
with st.chat_message(message["role"]):
st.markdown(message["content"])

# Handle user input for queries
if user_input := st.chat_input("Enter your question:"):
# Append user's query to chat history
st.session_state.chat_history.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)

# Process the query and display LLM response
with st.chat_message("assistant"):
response_placeholder = st.empty()
full_response = ""

# Stream the LLM's response
query_result = query_engine.query(user_input)
for chunk in query_result.response_gen:
full_response += chunk
response_placeholder.markdown(full_response + "▌")

response_placeholder.markdown(full_response)

# Add the assistant's response to chat history
st.session_state.chat_history.append({"role": "assistant", "content": full_response})

Code Breakdown :

  1. Imports:
import os
import base64
import gc
import tempfile
import uuid
from IPython.display import Markdown, display
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
import streamlit as st
  • The necessary libraries are imported, including os, base64, gc, tempfile, and uuid for file handling and session management.
  • Libraries from llama_index are imported to handle document indexing and querying.
  • The streamlit library is imported for building the web interface.

2. Session State Initialization:

if "id" not in st.session_state:
st.session_state.id = uuid.uuid4()
st.session_state.file_cache = {}

session_id = st.session_state.id
client = None
  • A unique session ID is generated using uuid if it doesn't already exist in st.session_state.
  • A dictionary named file_cache is initialized to cache uploaded documents for future queries.

3. Loading the Language Model:

@st.cache_resource
def load_llm():
llm = Ollama(model="llama3.2:3b-instruct-q8_0", request_timeout=120.0)
return llm
  • The load_llm function loads the Llama model specified in the code (llama3.2:3b-instruct-q8_0).
  • It is cached using @st.cache_resource to optimize performance by preventing re-initialization on every run.

4. Resetting the Chat State:

def reset_chat():
st.session_state.messages = []
st.session_state.context = None
gc.collect()
  • The reset_chat function clears the chat history and releases unused memory.
  • It resets messages and context in st.session_state.

5. Rendering PDF Files:

def display_pdf(file):
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")

pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="400" height="100%" type="application/pdf"
style="height:100vh; width:100%">
</iframe>"""
st.markdown(pdf_display, unsafe_allow_html=True)
  • The display_pdf function takes a PDF file as input, encodes it in Base64, and displays it in an iframe.
  • This allows users to view the uploaded document directly in the app.

6. Sidebar for Document Upload:

with st.sidebar:
st.header(f"Add your documents!")

uploaded_file = st.file_uploader("Choose your `.pdf` file", type="pdf")

if uploaded_file:
try:
with tempfile.TemporaryDirectory() as temp_dir:
file_path = os.path.join(temp_dir, uploaded_file.name)

with open(file_path, "wb") as f:
f.write(uploaded_file.getvalue())

file_key = f"{session_id}-{uploaded_file.name}"
st.write("Indexing your document...")

if file_key not in st.session_state.get('file_cache', {}):
if os.path.exists(temp_dir):
loader = SimpleDirectoryReader(
input_dir=temp_dir,
required_exts=[".pdf"],
recursive=True
)
else:
st.error('Could not find the file you uploaded, please check again...')
st.stop()

docs = loader.load_data()

llm = load_llm()
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(docs, show_progress=True)

Settings.llm = llm
query_engine = index.as_query_engine(streaming=True)

qa_prompt_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

st.session_state.file_cache[file_key] = query_engine
else:
query_engine = st.session_state.file_cache[file_key]

st.success("Ready to Chat!")
display_pdf(uploaded_file)
except Exception as e:
st.error(f"An error occurred: {e}")
st.stop()
  • A sidebar is created for users to upload PDF files.
  • When a file is uploaded, it is temporarily saved in a directory.
  • The PDF is indexed using SimpleDirectoryReader, and a query engine is set up for document querying.
  • An embedding model is initialized, and a prompt template is created to format queries sent to the LLM.
  • The uploaded document is displayed using the display_pdf function after successful processing.

7. Main Chat Interface:

col1, col2 = st.columns([6, 1])

with col1:
st.header(f"Chat with Docs using Llama-3.2")

with col2:
st.button("Clear ↺", on_click=reset_chat)
  • The main chat interface is structured with two columns: one for the chat header and another for a button to clear the chat.
  • The reset_chat function is linked to the button, allowing users to start a fresh conversation.

8. Displaying Chat History:

# Initialize chat history
if "messages" not in st.session_state:
reset_chat()

# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
  • The chat history is initialized and displayed from st.session_state.messages.
  • Each message is rendered based on its role (user or assistant) using Streamlit’s chat message component.

9. Handling User Input:

# Accept user input
if prompt := st.chat_input("What's up?"):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)

with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""

# Simulate stream of response with milliseconds delay
streaming_response = query_engine.query(prompt)

for chunk in streaming_response.response_gen:
full_response += chunk
message_placeholder.markdown(full_response + "▌")

message_placeholder.markdown(full_response)

# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": full_response})
  • A chat input box captures user queries.
  • User queries are appended to the chat history and displayed immediately.
  • The assistant’s response is generated in real-time using the query engine.
  • A placeholder is used to show the assistant’s response as it streams back, enhancing user experience.

Running the Application

To run the complete application:

  1. Ensure you have installed the necessary packages.
  2. Save the entire code snippet into a Python file (e.g., chat_app.py).

3. Use the terminal to navigate to the directory containing the file and run:

streamlit run chat_app.py

4. The application will launch in your web browser, allowing you to interact with it by uploading documents and chatting with the model.

This approach opens up numerous possibilities for creating intelligent, interactive applications that enhance user experience through natural language processing and document interaction.

Conclusion :

This code provides a robust framework for creating a chat user interface using Streamlit and an LLM for intelligent responses. By following the code structure and explanation, developers can easily customize and extend the functionality of this chat interface to suit their needs. With Streamlit’s powerful capabilities, building interactive applications becomes more accessible and efficient.

Sign up to discover human stories that deepen your understanding of the world.

Madhan Kumar
Madhan Kumar

Written by Madhan Kumar

Software Engineer By Profession | Blogger | Full Stack Developer | AI Explorer | Writer of tech, love, relationships, trends & self-care articles

No responses yet

Write a response