Building a Chat User Interface with Streamlit
Streamlit Chat User Interface

Introduction :
In recent years, Streamlit has gained popularity among developers for building interactive web applications, especially for data science and machine learning projects. This open-source library allows developers to create stunning web applications with minimal code and no frontend experience. One exciting use case for Streamlit is building a chat user interface (UI) that can interact with users and respond to queries.
In this Article, we will explore what Streamlit is, its key features, and how to create a chat user interface using the provided code snippet.
What is Streamlit?
Streamlit is a Python library that simplifies the process of building web applications. Its main features include:
- Ease of Use: Streamlit allows developers to create web apps quickly using simple Python scripts.
- Real-time Interactivity: The framework enables real-time interaction with the application, allowing users to see updates without needing to refresh the page.
- Integrated Components: Streamlit provides a range of built-in components (like sliders, buttons, and charts) that enhance user experience.
- Data Visualization: Built-in support for popular visualization libraries, such as Matplotlib, Plotly, and Altair, makes it easy to present data visually.
Setting Up the Environment :
To get started with Streamlit, ensure you have Python installed. You can then install Streamlit and other required libraries using pip:
pip install streamlit llama-index-llms-ollama llama_index.embeddings.huggingface
Building a Chat User Interface
In this section, we will go through the process of building a chat user interface using Streamlit. Below is a comprehensive explanation of the full code provided, detailing each section and its functionality in the context of building a chat user interface using Streamlit.
import os
import base64
import gc
import tempfile
import uuid
from IPython.display import display
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import streamlit as st
# Initialize session state for unique session identification and file caching
if "session_id" not in st.session_state:
st.session_state.session_id = uuid.uuid4()
st.session_state.document_cache = {}
session_id = st.session_state.session_id
# Load language model with a caching mechanism
@st.cache_resource
def initialize_llm():
return Ollama(model="llama3.2:3b-instruct-q8_0", request_timeout=120.0)
# Reset the chat state and clear garbage
def reset_chat_state():
st.session_state.chat_history = []
gc.collect()
# Display PDF in a sidebar with embedded preview
def render_pdf(file):
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")
pdf_html = f"""
<iframe src="data:application/pdf;base64,{base64_pdf}" width="100%" height="100%" type="application/pdf"
style="height:100vh;">
</iframe>
"""
st.markdown(pdf_html, unsafe_allow_html=True)
# Sidebar for document upload and processing
with st.sidebar:
st.header("Upload a Document")
uploaded_file = st.file_uploader("Select a PDF file", type="pdf")
if uploaded_file:
try:
# Temporarily store uploaded file for processing
with tempfile.TemporaryDirectory() as temp_dir:
file_path = os.path.join(temp_dir, uploaded_file.name)
# Save uploaded file
with open(file_path, "wb") as f:
f.write(uploaded_file.getvalue())
# Create a unique key for the file to manage state
file_key = f"{session_id}-{uploaded_file.name}"
st.write("Processing and indexing document...")
# Process and cache the document if it hasn't been processed already
if file_key not in st.session_state.document_cache:
if os.path.exists(temp_dir):
loader = SimpleDirectoryReader(
input_dir=temp_dir,
required_exts=[".pdf"],
recursive=True
)
else:
st.error('File not found. Please check and try again.')
st.stop()
# Load document data
documents = loader.load_data()
# Initialize LLM and embedding model
llm = initialize_llm()
embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)
# Create an index over the loaded data
Settings.embed_model = embedding_model
document_index = VectorStoreIndex.from_documents(documents, show_progress=True)
# Create a query engine using the LLM
Settings.llm = llm
query_engine = document_index.as_query_engine(streaming=True)
# Define custom prompt template for QA
qa_prompt_template_str = (
"Context information is provided below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Based on the context, please respond concisely. "
"If the answer is unknown, respond with 'I don't know'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_template = PromptTemplate(qa_prompt_template_str)
query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_template}
)
# Cache the query engine for future use
st.session_state.document_cache[file_key] = query_engine
else:
# Load query engine from cache
query_engine = st.session_state.document_cache[file_key]
# Confirm document is ready for interaction and display the PDF
st.success("Document successfully indexed. Ready for queries!")
render_pdf(uploaded_file)
except Exception as e:
st.error(f"Error processing document: {e}")
st.stop()
# Main chat interface layout
col1, col2 = st.columns([6, 1])
with col1:
st.header("Document Chat Interface")
with col2:
st.button("Clear Chat", on_click=reset_chat_state)
# Initialize chat history
if "chat_history" not in st.session_state:
reset_chat_state()
# Display chat history
for message in st.session_state.chat_history:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Handle user input for queries
if user_input := st.chat_input("Enter your question:"):
# Append user's query to chat history
st.session_state.chat_history.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)
# Process the query and display LLM response
with st.chat_message("assistant"):
response_placeholder = st.empty()
full_response = ""
# Stream the LLM's response
query_result = query_engine.query(user_input)
for chunk in query_result.response_gen:
full_response += chunk
response_placeholder.markdown(full_response + "▌")
response_placeholder.markdown(full_response)
# Add the assistant's response to chat history
st.session_state.chat_history.append({"role": "assistant", "content": full_response})
Code Breakdown :
- Imports:
import os
import base64
import gc
import tempfile
import uuid
from IPython.display import Markdown, display
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
import streamlit as st
- The necessary libraries are imported, including
os
,base64
,gc
,tempfile
, anduuid
for file handling and session management. - Libraries from
llama_index
are imported to handle document indexing and querying. - The
streamlit
library is imported for building the web interface.
2. Session State Initialization:
if "id" not in st.session_state:
st.session_state.id = uuid.uuid4()
st.session_state.file_cache = {}
session_id = st.session_state.id
client = None
- A unique session ID is generated using
uuid
if it doesn't already exist inst.session_state
. - A dictionary named
file_cache
is initialized to cache uploaded documents for future queries.
3. Loading the Language Model:
@st.cache_resource
def load_llm():
llm = Ollama(model="llama3.2:3b-instruct-q8_0", request_timeout=120.0)
return llm
- The
load_llm
function loads the Llama model specified in the code (llama3.2:3b-instruct-q8_0
). - It is cached using
@st.cache_resource
to optimize performance by preventing re-initialization on every run.
4. Resetting the Chat State:
def reset_chat():
st.session_state.messages = []
st.session_state.context = None
gc.collect()
- The
reset_chat
function clears the chat history and releases unused memory. - It resets
messages
andcontext
inst.session_state
.
5. Rendering PDF Files:
def display_pdf(file):
st.markdown("### PDF Preview")
base64_pdf = base64.b64encode(file.read()).decode("utf-8")
pdf_display = f"""<iframe src="data:application/pdf;base64,{base64_pdf}" width="400" height="100%" type="application/pdf"
style="height:100vh; width:100%">
</iframe>"""
st.markdown(pdf_display, unsafe_allow_html=True)
- The
display_pdf
function takes a PDF file as input, encodes it in Base64, and displays it in an iframe. - This allows users to view the uploaded document directly in the app.
6. Sidebar for Document Upload:
with st.sidebar:
st.header(f"Add your documents!")
uploaded_file = st.file_uploader("Choose your `.pdf` file", type="pdf")
if uploaded_file:
try:
with tempfile.TemporaryDirectory() as temp_dir:
file_path = os.path.join(temp_dir, uploaded_file.name)
with open(file_path, "wb") as f:
f.write(uploaded_file.getvalue())
file_key = f"{session_id}-{uploaded_file.name}"
st.write("Indexing your document...")
if file_key not in st.session_state.get('file_cache', {}):
if os.path.exists(temp_dir):
loader = SimpleDirectoryReader(
input_dir=temp_dir,
required_exts=[".pdf"],
recursive=True
)
else:
st.error('Could not find the file you uploaded, please check again...')
st.stop()
docs = loader.load_data()
llm = load_llm()
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(docs, show_progress=True)
Settings.llm = llm
query_engine = index.as_query_engine(streaming=True)
qa_prompt_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
st.session_state.file_cache[file_key] = query_engine
else:
query_engine = st.session_state.file_cache[file_key]
st.success("Ready to Chat!")
display_pdf(uploaded_file)
except Exception as e:
st.error(f"An error occurred: {e}")
st.stop()
- A sidebar is created for users to upload PDF files.
- When a file is uploaded, it is temporarily saved in a directory.
- The PDF is indexed using
SimpleDirectoryReader
, and a query engine is set up for document querying. - An embedding model is initialized, and a prompt template is created to format queries sent to the LLM.
- The uploaded document is displayed using the
display_pdf
function after successful processing.
7. Main Chat Interface:
col1, col2 = st.columns([6, 1])
with col1:
st.header(f"Chat with Docs using Llama-3.2")
with col2:
st.button("Clear ↺", on_click=reset_chat)
- The main chat interface is structured with two columns: one for the chat header and another for a button to clear the chat.
- The
reset_chat
function is linked to the button, allowing users to start a fresh conversation.
8. Displaying Chat History:
# Initialize chat history
if "messages" not in st.session_state:
reset_chat()
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
- The chat history is initialized and displayed from
st.session_state.messages
. - Each message is rendered based on its role (user or assistant) using Streamlit’s chat message component.
9. Handling User Input:
# Accept user input
if prompt := st.chat_input("What's up?"):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
# Simulate stream of response with milliseconds delay
streaming_response = query_engine.query(prompt)
for chunk in streaming_response.response_gen:
full_response += chunk
message_placeholder.markdown(full_response + "▌")
message_placeholder.markdown(full_response)
# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": full_response})
- A chat input box captures user queries.
- User queries are appended to the chat history and displayed immediately.
- The assistant’s response is generated in real-time using the query engine.
- A placeholder is used to show the assistant’s response as it streams back, enhancing user experience.
Running the Application
To run the complete application:
- Ensure you have installed the necessary packages.
- Save the entire code snippet into a Python file (e.g.,
chat_app.py
).
3. Use the terminal to navigate to the directory containing the file and run:
streamlit run chat_app.py
4. The application will launch in your web browser, allowing you to interact with it by uploading documents and chatting with the model.
This approach opens up numerous possibilities for creating intelligent, interactive applications that enhance user experience through natural language processing and document interaction.
Conclusion :
This code provides a robust framework for creating a chat user interface using Streamlit and an LLM for intelligent responses. By following the code structure and explanation, developers can easily customize and extend the functionality of this chat interface to suit their needs. With Streamlit’s powerful capabilities, building interactive applications becomes more accessible and efficient.