For YouTrending This WeekAI AgentsAI Tools & ReviewsMachine LearningLarge Language ModelsTutorialsIndustry NewsGeneral
Tutorials

How to Build a Custom ChatGPT Clone with Python and Streamlit

Quick answer: You can build your own custom ChatGPT clone in about 200 lines of Python using the OpenAI API for the language model and Streamlit for the chat interface. The setup takes under an hour. Your custom ChatGPT clone includes conversation history, streaming responses, configurable model settings, and your own branding. No web development experience needed.

Introduction

ChatGPT costs $20 a month for the Plus plan. The API that powers it costs a fraction of a cent per conversation. For developers and small teams, that math opens an obvious door. Build your own interface, pay only for what you use, and customize everything.

A custom ChatGPT clone is not a toy project. It is the foundation for AI-powered customer support bots, internal knowledge assistants, code generation tools, and specialized tutoring systems. Every company that deploys a branded AI assistant starts with this architecture. A frontend that collects user input. A backend that sends it to an LLM. And a streaming response that feels real-time.

This guide walks through every line of code to build exactly that. By the end, you will have a fully functional chatbot running on localhost, ready for deployment.

What You Will Need

The project requires three things you probably already have.

Python 3.9 or later. Download from python.org if you do not have it.

An OpenAI API key. Go to platform.openai.com, create an account, add a payment method, and generate an API key. The cost will be pennies for personal use. GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. A typical conversation costs less than a cent.

A text editor. VS Code, PyCharm, or even Notepad will work.

That is it. No frontend framework. No database. No cloud deployment. Just Python and one library.

Step 1: Set Up the Project

Create a project folder and install the dependencies.

mkdir chatgpt-clone
cd chatgpt-clone
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

pip install openai streamlit python-dotenv

Create a file named `.env` in the project root to store your API key.

OPENAI_API_KEY=sk-your-api-key-here

Never commit this file to git. Add `.env` to your `.gitignore`.

Step 2: The API Wrapper

Create `chat_engine.py`. This file handles all communication with OpenAI.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

SYSTEM_PROMPT = """You are a helpful AI assistant. You provide clear, 
accurate, and well-structured answers. You keep responses concise unless 
the user asks for detail. You admit when you are unsure."""

def get_response(messages, model="gpt-4o-mini", temperature=0.7):
    """
    Generate a streaming response from the OpenAI API.
    
    Args:
        messages: List of message dicts (role, content)
        model: Model identifier
        temperature: 0.0 (precise) to 2.0 (creative)
    
    Yields:
        Content chunks as they arrive from the API
    """
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        stream=True,
        max_tokens=4096
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            yield chunk.choices[0].delta.content

def get_available_models():
    """Return list of models this API key can access."""
    try:
        models = client.models.list()
        return [m.id for m in models if m.id.startswith("gpt")]
    except Exception as e:
        print(f"Error fetching models: {e}")
        return ["gpt-4o-mini", "gpt-4o", "gpt-3.5-turbo"]

The key detail here is streaming. Without `stream=True`, the API waits until the entire response is generated before sending it. That can take 10 to 30 seconds for a long answer. With streaming enabled, tokens arrive as the model generates them, and the UI updates in real time. This is what makes the experience feel responsive.

The system prompt sets the personality and constraints. You can change this to make your clone behave differently. A customer support bot might have “You work for Acme Corp and only answer questions about our products.” A coding tutor might have “You explain concepts with examples and never write production code without explaining it first.”

Step 3: The Streamlit UI

Create `app.py`. This is the entire user interface.

import streamlit as st
from chat_engine import get_response, get_available_models, SYSTEM_PROMPT

# Page configuration
st.set_page_config(
    page_title="AI Assistant",
    page_icon=":robot_face:",
    layout="wide"
)

# Custom CSS for better chat UI
st.markdown("""

""", unsafe_allow_html=True)

# Initialize session state
if "messages" not in st.session_state:
    st.session_state.messages = [
        {"role": "system", "content": SYSTEM_PROMPT}
    ]

if "model" not in st.session_state:
    st.session_state.model = "gpt-4o-mini"

if "temperature" not in st.session_state:
    st.session_state.temperature = 0.7

# Header
st.markdown('

AI Assistant

', unsafe_allow_html=True) # Sidebar settings with st.sidebar: st.header("Settings") model_options = get_available_models() selected_model = st.selectbox( "Model", model_options, index=model_options.index(st.session_state.model) if st.session_state.model in model_options else 0 ) temperature = st.slider( "Temperature", min_value=0.0, max_value=2.0, value=st.session_state.temperature, step=0.1, help="Lower = more precise, Higher = more creative" ) st.session_state.model = selected_model st.session_state.temperature = temperature if st.button("Clear Conversation", type="secondary"): st.session_state.messages = [ {"role": "system", "content": SYSTEM_PROMPT} ] st.rerun() st.divider() st.markdown("### Usage Tips") st.markdown("- GPT-4o-mini is fast and cheap for everyday use") st.markdown("- GPT-4o is smarter but slower and more expensive") st.markdown("- Lower temperature for factual answers") st.markdown("- Higher temperature for creative writing") st.divider() st.markdown("Built with Streamlit + OpenAI API") # Display chat history for message in st.session_state.messages: if message["role"] != "system": with st.chat_message(message["role"]): st.markdown(message["content"]) # Chat input if prompt := st.chat_input("Type your message here..."): # Add user message to history and display it st.session_state.messages.append({"role": "user", "content": prompt}) with st.chat_message("user"): st.markdown(prompt) # Generate assistant response with st.chat_message("assistant"): message_placeholder = st.empty() full_response = "" for chunk in get_response( st.session_state.messages, model=st.session_state.model, temperature=st.session_state.temperature ): full_response += chunk message_placeholder.markdown(full_response + "|") message_placeholder.markdown(full_response) # Save complete response to history st.session_state.messages.append( {"role": "assistant", "content": full_response} )

That is the entire UI. Streamlit handles all the HTML, CSS, and JavaScript. The chat interface, the model selector, the temperature slider, the clear button. Everything is handled by about 80 lines of Python.

The session state persists the conversation across interactions. Each time the user sends a message, the entire conversation history is sent to the API. This is what gives the chatbot memory within a session. The API does not store state. The client does.

Step 4: Add a Cost Tracker

Let us add transparency. Users should know how much each conversation costs.

class CostTracker:
    """Track API usage and estimate costs."""
    
    def __init__(self):
        self.total_input_tokens = 0
        self.total_output_tokens = 0
        
        # Approximate cost per 1M tokens
        self.pricing = {
            "gpt-4o-mini": {"input": 0.15, "output": 0.60},
            "gpt-4o": {"input": 2.50, "output": 10.00},
            "gpt-3.5-turbo": {"input": 0.50, "output": 1.50}
        }
    
    def update(self, model, input_tokens, output_tokens):
        self.total_input_tokens += input_tokens
        self.total_output_tokens += output_tokens
    
    def get_cost(self, model):
        if model not in self.pricing:
            return "N/A"
        p = self.pricing[model]
        cost = (self.total_input_tokens / 1_000_000 * p["input"] +
                self.total_output_tokens / 1_000_000 * p["output"])
        return f"${cost:.4f}"
    
    def display(self, model):
        return (
            f"Input: {self.total_input_tokens:,} tokens | "
            f"Output: {self.total_output_tokens:,} tokens | "
            f"Est. cost: {self.get_cost(model)}"
        )

To get token counts from the API, modify `get_response` to return usage data.

def get_response_with_cost(messages, model="gpt-4o-mini", temperature=0.7):
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        stream=True,
        stream_options={"include_usage": True}
    )
    
    full_text = ""
    usage_data = None
    
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_text += content
            yield content, None
        
        if chunk.usage:
            usage_data = chunk.usage
    
    yield full_text, usage_data

Step 5: Add Conversation Export

Users might want to save their chat history. Two lines of Streamlit do the job.

import json
from datetime import datetime

def export_conversation(messages):
    """Export conversation as JSON."""
    export_data = []
    for msg in messages:
        if msg["role"] != "system":
            export_data.append({
                "role": msg["role"],
                "content": msg["content"],
                "timestamp": datetime.now().isoformat()
            })
    return json.dumps(export_data, indent=2)

# Add to sidebar
if st.button("Export Conversation"):
    chat_json = export_conversation(st.session_state.messages)
    st.download_button(
        label="Download JSON",
        data=chat_json,
        file_name=f"chat-{datetime.now().strftime('%Y%m%d-%H%M%S')}.json",
        mime="application/json"
    )

Step 6: Run It

streamlit run app.py

The terminal will show a URL. Open `http://localhost:8501` in your browser. You will see the chat interface. Type a message. The response streams in real time.

Going Further: Beyond the Basic Clone

The architecture you just built is the foundation for more advanced applications. Here is how to extend it.

Add a knowledge base with RAG. Load documents, chunk them, embed them into a vector database, and prepend relevant chunks to each API call as context. This turns your generic assistant into an expert on your specific documents.

Switch models. The architecture works with any OpenAI-compatible API. Replace `openai` with `anthropic` for Claude or use a local model through Ollama or LM Studio by changing the base URL.

Add authentication. Wrap the Streamlit app with a login page using Streamlit Authenticator to prevent unauthorized access.

Deploy to the cloud. Streamlit Community Cloud, Railway, or Hugging Face Spaces all support one-click deployment. Your app will be accessible from any device.

Add image support. GPT-4o and GPT-4o-mini accept images. Modify the input to accept file uploads and include them in the API call.

# Image support example
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "png"])
if uploaded_file:
    import base64
    image_data = base64.b64encode(uploaded_file.getvalue()).decode()
    
    st.session_state.messages.append({
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {
                "url": f"data:image/jpeg;base64,{image_data}"
            }}
        ]
    })

Frequently Asked Questions

A: For personal use, roughly $1 to $5 per month depending on usage. GPT-4o-mini costs $0.15 per million input tokens. A typical conversation with 10 exchanges uses about 5,000 tokens, costing less than $0.01.

A: Yes. Use Ollama or LM Studio to run a local model, then change the OpenAI base URL. Replace `https://api.openai.com/v1` with `http://localhost:11434/v1` for Ollama. The rest of the code stays exactly the same.

A: Each model has a context window. GPT-4o-mini handles 128,000 tokens, roughly 96,000 words. The conversation history is sent with every request, so long conversations use more tokens and cost more.

A: Not with basic Streamlit. Each user gets their own session, but all sessions share the same server process. For multi-user production deployment, use a proper backend with WebSockets.

A: The code stores conversation history in your browser session only. When you close the tab, it is gone. If you add a database, that changes. OpenAI stores API requests for up to 30 days for abuse monitoring unless you opt out in your account settings.

A: Yes. Add a text input in the sidebar for the system prompt. Changes take effect on the next message. This lets users switch between different assistant personalities without restarting.

Conclusion

Your custom ChatGPT clone is running. It has streaming responses, conversation memory, configurable settings, and cost tracking. You built it in an afternoon with one Python file and a few libraries.

The real value is what comes next. Add a knowledge base and it becomes a company assistant. Deploy it and your team has a private AI tool. Swap the model and you are no longer dependent on any single provider. The architecture you just built is the starting point for every serious AI application.

The API costs pennies. The learning is permanent.

Schema to implement on this page:

– Article schema (required)

– FAQPage schema (pull from the FAQ section above)

– BreadcrumbList schema (for site navigation)

Related: build a vector database from scratch and how large language models work.

Share: 𝕏 Twitter in LinkedIn

Yitzkak Agu

AI & ML Writer

AI and machine learning writer at AI 'n Skills. I cover LLMs, AI tools, and developer workflows — breaking down complex concepts for developers and curious minds.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top