Build a Local AI Doc Chat — Flutter + Python + Ollama

Python Flutter Local AI 2025

Build a local AI chat app
that runs on your laptop.

No API key. No cloud. No monthly bill. Just a Python backend, a Flutter app, and an open-source model running entirely on your machine. This is a complete walkthrough — every file, every decision.

FastAPI FAISS SentenceTransformers Ollama + Gemma3 Flutter

Watch first

4-minute overview on YouTube

The video gives a quick overview. This post goes deeper — full source code, explanations of every design decision, and things the video skips.

In this post

01 — concept

What is RAG, and why does it work?

RAG stands for Retrieval-Augmented Generation. Instead of asking an AI "what do you know about X?", you first pull the relevant text out of your own documents, hand it to the model as context, and then ask "based on this, what's the answer to X?"

The model never needs to be trained on your documents. It just reads what you give it in the prompt — the same way you'd paste an article into ChatGPT before asking a question. The difference here is that the retrieval step is automatic, semantic, and fast.

question

→

embed it

→

search FAISS

→

top 3 chunks

→

build prompt

→

stream answer

"Semantic search" means we're not matching keywords — we're matching meaning. The word "automobile" and "car" will match the same documents even though they share no letters. That's what the embedding model gives us.

Why local? Your documents never leave your machine. No data goes to OpenAI, Anthropic, or anyone else. For PDFs with sensitive content — contracts, personal notes, work docs — that matters a lot.

The backend is three Python files that each do one thing:

ingest.py

reads PDF or raw text
splits into chunks
embeds each chunk
saves to vectorstore

vectorstore.py

FAISS index (dim 384)
add vectors
search by similarity
persist to disk

app.py

FastAPI server
/upload-pdf, /upload-text
/ask with streaming
calls local Ollama

02 — setup

Ollama — a local model server

Ollama is a tool that lets you run open-source LLMs locally. Think of it as a local inference server — once running, it exposes an HTTP API at localhost:11434 that any app can talk to.

Go to ollama.com, download the installer, run it. Then verify:

terminal

ollama --version

We're using Gemma 3 4B — Google's open model. Small enough to run on most laptops (needs ~5 GB RAM), fast enough for real-time Q&A.

terminal

# pull the model — ~2.5 GB download, cached after
ollama pull gemma3:4b

# sanity check — chat with it before wiring up the app
ollama run gemma3

Tight on RAM? Try gemma3:1b instead. It's smaller and faster. Change the LLM constant in app.py to match.

Once downloaded, Ollama runs as a background service. The Python backend calls http://127.0.0.1:11434/api/generate directly — no extra setup needed.

03 — project

How the project is laid out

backend/

├── app.py # server entry point

├── ingest.py # document processing

├── vectorstore.py # FAISS wrapper

├── requirements.txt

├── index.faiss # created on first upload

└── meta.json # created on first upload

docmind/ # Flutter app

└── lib/

├── main.dart

├── theme.dart # colors + host URL

├── services/

│ └── api.dart

└── screens/

├── chat_screen.dart

└── upload_screen.dart

The backend and Flutter app are completely independent. The Flutter app is just a client making HTTP requests — you could swap it for a web app or CLI and the backend wouldn't change at all.

04 — python backend

vectorstore.py — where your documents live

Every document you upload eventually becomes a list of 384 numbers — a vector that encodes its meaning. This file stores those vectors and lets you search through them using FAISS.

FAISS (built by Meta) is a library built for fast nearest-neighbour search over large sets of vectors. When you ask a question, we convert it to a vector too, then ask FAISS: "find me the 3 stored vectors closest to this one." The matching chunks become the context we pass to the LLM.

Why 384 dimensions?

That's the output size of all-MiniLM-L6-v2, the embedding model we're using. Every vector it produces is exactly 384 numbers. FAISS needs to know this upfront to set up the index — hence DIM = 384.

vectorstore.py

import faiss
import numpy as np
import json
import os

DIM = 384

index = faiss.IndexFlatL2(DIM)
metadata = []

def save():
    faiss.write_index(index, "index.faiss")
    with open("meta.json", "w") as f:
        json.dump(metadata, f)

def load():
    global index, metadata
    if os.path.exists("index.faiss") and os.path.getsize("index.faiss") > 0:
        try:
            index = faiss.read_index("index.faiss")
        except:
            index = faiss.IndexFlatL2(DIM)
    if os.path.exists("meta.json") and os.path.getsize("meta.json") > 0:
        with open("meta.json", "r") as f:
            metadata = json.load(f)

def add(vector, meta):
    vector = np.array([vector]).astype("float32")
    index.add(vector)
    metadata.append(meta)

def search(vector, k=3):
    vector = np.array([vector]).astype("float32")
    k = min(k, len(metadata))
    if k == 0:
        return []
    D, I = index.search(vector, k)
    return [metadata[i] for i in I[0] if 0 <= i < len(metadata)]

save()Writes the FAISS index to index.faiss and all the original text chunks to meta.json. Called at the end of every ingest operation so nothing is lost between server restarts.
load()Reads both files back on startup. If they don't exist yet, it initializes a fresh empty index. The size check avoids a crash on zero-byte files.
add()Inserts a single vector into FAISS and appends the corresponding metadata dict — which holds the original text and source filename — to the Python list.
search()Finds the k nearest vectors using L2 distance. Returns the metadata dicts containing the raw text, which become the context for the LLM prompt. The bounds check on I[0] prevents index-out-of-range errors.

05 — python backend

ingest.py — preparing documents for search

This file does the preprocessing work. It takes a raw document — a PDF or a block of text — and turns it into a set of searchable vectors stored in FAISS. Two things happen here: chunking and embedding.

Chunking — and why the overlap matters

You can't embed an entire document as one vector. A 50-page PDF embedded as a single vector would be useless for retrieval — you'd get the whole document back regardless of what you asked. So we split the text into 500-character chunks.

The 100-character overlap is the detail most people miss. Without it, a sentence that falls right on a chunk boundary gets cut in half, and neither half makes sense in isolation. The overlap ensures every sentence appears complete in at least one chunk.

PDF / text

→

extract text

→

500-char chunks

→

embed each

→

store + save

ingest.py

from sentence_transformers import SentenceTransformer
from pypdf import PdfReader
import vectorstore as vs

model = SentenceTransformer("all-MiniLM-L6-v2")

def chunk_text(text, size=500, overlap=100):
    chunks = []
    start = 0
    while start < len(text):
        end = start + size
        chunk = text[start:end]
        chunks.append(chunk)
        start += size - overlap
    return chunks

def add_text(text, source="manual"):
    chunks = chunk_text(text)
    for c in chunks:
        emb = model.encode(c).astype("float32")
        vs.add(emb, {"text": c, "source": source})
    vs.save()

def add_pdf(path):
    reader = PdfReader(path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    add_text(text, source=path)

The embedding model: all-MiniLM-L6-v2 is about 80 MB and downloads automatically on first run. It runs CPU-only just fine and is accurate enough for document Q&A. Each call to model.encode() returns a numpy array of shape (384,).

06 — python backend

app.py — the FastAPI server

This is the server the Flutter app talks to. It wires everything together: receives uploads, triggers ingestion, handles questions, and streams answers back token by token from Ollama.

How streaming works

FastAPI's StreamingResponse lets a generator function yield data as it becomes available. Ollama's API supports streaming too — it sends back one JSON object per token. We parse each line, pull out the "response" field, and yield it immediately. The Flutter app receives these tokens as bytes over a persistent HTTP connection, decodes them, and appends each one to the message in real time.

app.py

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi import UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
import os, json, requests
from sentence_transformers import SentenceTransformer
import ingest
import vectorstore as vs

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"], allow_credentials=True,
    allow_methods=["*"], allow_headers=["*"],
)

model = SentenceTransformer("all-MiniLM-L6-v2")
OLLAMA_URL = "http://127.0.0.1:11434/api/generate"
LLM = "gemma3:4b"

os.makedirs("docs", exist_ok=True)

@app.on_event("startup")
def startup():
    vs.load()  # reload persisted index on every restart

@app.post("/upload-pdf")
async def upload_pdf(file: UploadFile = File(...)):
    file_path = f"docs/{file.filename}"
    with open(file_path, "wb") as f:
        content = await file.read()
        f.write(content)
    try:
        ingest.add_pdf(file_path)
        return {"status": "success", "message": f"{file.filename} processed"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

@app.post("/upload-text")
def upload_text(data: dict):
    ingest.add_text(data["text"])
    return {"status": "stored"}

@app.post("/ask")
def ask(data: dict):
    question = data["question"]

    # 1. embed the question
    q_emb = model.encode(question).astype("float32")

    # 2. find the 3 closest chunks in FAISS
    results = vs.search(q_emb, k=3)
    context = "\n\n".join([r["text"] for r in results])

    # 3. build a strict context-only prompt
    prompt = f"""Use ONLY this context to answer.

Context:
{context}

Question:
{question}

If the answer is not in the context, say "not found in documents"."""

    # 4. stream tokens from Ollama
    def stream():
        response = requests.post(
            OLLAMA_URL,
            json={"model": LLM, "prompt": prompt, "stream": True},
            stream=True,
        )
        for line in response.iter_lines():
            if line:
                try:
                    data = json.loads(line.decode("utf-8"))
                    if "response" in data:
                        yield data["response"]
                except:
                    pass

    return StreamingResponse(stream(), media_type="text/plain")

1Encode the user's question into a 384-dim vector using the same SentenceTransformer used during ingestion. This is key — both documents and queries must be embedded with the same model.
2Search FAISS for the 3 most semantically similar chunks across all uploaded documents.
3Concatenate the chunks into a context string and inject it into the prompt. The prompt explicitly instructs the model to answer only from the context — this is what keeps it grounded and prevents hallucination.
4POST to Ollama with stream: True, iterate over the response line by line, and yield each token immediately via StreamingResponse.

requirements.txt

fastapi==0.104.1
uvicorn==0.24.0
requests==2.31.0
sentence-transformers==5.5.1
pypdf==3.17.1
faiss-cpu==1.7.4
numpy==1.24.3
python-multipart==0.0.20

install & run

pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 8000 --reload

07 — flutter

Flutter app — two screens, one API class

The Flutter side is deliberately lean. No state management library, no generated code, no complex architecture. Just a StatefulWidget per screen, a static Api class for HTTP calls, and a theme file for shared constants.

pubspec.yaml — dependencies

dependencies:
  flutter:
    sdk: flutter
  http: ^1.6.0
  file_picker: ^11.0.2
  cupertino_icons: ^1.0.8

main.dart

import 'package:flutter/material.dart';
import 'package:flutter/services.dart';
import 'theme.dart';
import 'screens/chat_screen.dart';
import 'screens/upload_screen.dart';

void main() {
  WidgetsFlutterBinding.ensureInitialized();
  SystemChrome.setSystemUIOverlayStyle(const SystemUiOverlayStyle(
    statusBarColor: Colors.transparent,
    statusBarIconBrightness: Brightness.light,
  ));
  runApp(const App());
}

class App extends StatelessWidget {
  const App({super.key});
  @override
  Widget build(BuildContext context) => MaterialApp(
    title: 'DocMind',
    debugShowCheckedModeBanner: false,
    theme: ThemeData(
      brightness: Brightness.dark,
      scaffoldBackgroundColor: kBg,
      colorScheme: const ColorScheme.dark(primary: kAccent, surface: kSurface),
      useMaterial3: true,
    ),
    home: const Home(),
  );
}

class Home extends StatefulWidget {
  const Home({super.key});
  @override
  State<Home> createState() => _HomeState();
}

class _HomeState extends State<Home> {
  int _tab = 0;
  @override
  Widget build(BuildContext context) => Scaffold(
    backgroundColor: kBg,
    body: IndexedStack(
      index: _tab,
      children: const [ChatScreen(), UploadScreen()],
    ),
    bottomNavigationBar: NavigationBar(
      backgroundColor: kSurface,
      indicatorColor: kAccent.withOpacity(0.15),
      selectedIndex: _tab,
      onDestinationSelected: (i) => setState(() => _tab = i),
      destinations: const [
        NavigationDestination(
          icon: Icon(Icons.chat_bubble_outline_rounded, color: kTextSec),
          selectedIcon: Icon(Icons.chat_bubble_rounded, color: kAccent),
          label: 'Chat',
        ),
        NavigationDestination(
          icon: Icon(Icons.folder_outlined, color: kTextSec),
          selectedIcon: Icon(Icons.folder_rounded, color: kAccent),
          label: 'Documents',
        ),
      ],
    ),
  );
}

08 — flutter

theme.dart — colors and the host URL

All shared constants live here. The only thing you need to change before running is kHost.

Physical device? Replace kHost with your computer's local IP — something like http://192.168.1.42:8000. Find it with ifconfig on Mac/Linux or ipconfig on Windows. For the Android emulator, use http://10.0.2.2:8000.

theme.dart

import 'package:flutter/material.dart';

const kBg      = Color(0xFF0D1117);
const kSurface = Color(0xFF161B22);
const kCard    = Color(0xFF1C2333);
const kBorder  = Color(0xFF30363D);
const kAccent  = Color(0xFF58A6FF);
const kGreen   = Color(0xFF3FB950);
const kRed     = Color(0xFFF85149);
const kTextPri = Color(0xFFE6EDF3);
const kTextSec = Color(0xFF8B949E);
const kTextDim = Color(0xFF30363D);

// Change to your machine's local IP when using a physical device
const kHost = 'http://192.168.1.10:8000';
// const kHost = 'http://10.0.2.2:8000'; // Android emulator

09 — flutter

api.dart — all HTTP calls in one place

One static class, three methods. The streaming one is the interesting bit: instead of waiting for the full response with http.post(), it uses http.Request and req.send() to get a streamed response, then listens to the byte stream and fires onChunk for every chunk of bytes that arrives. That callback drives the live typing animation in the chat screen.

services/api.dart

import 'dart:convert';
import 'dart:io';
import 'package:http/http.dart' as http;
import '../theme.dart';

class Api {
  static Future<void> ask({
    required String question,
    required void Function(String) onChunk,
    required void Function() onDone,
    required void Function(String) onError,
  }) async {
    try {
      final req = http.Request('POST', Uri.parse('$kHost/ask'));
      req.headers['Content-Type'] = 'application/json';
      req.body = jsonEncode({'question': question});
      final res = await req.send().timeout(const Duration(seconds: 120));
      res.stream.listen(
        (b) => onChunk(utf8.decode(b)),
        onDone: onDone,
        onError: (e) => onError(e.toString()),
      );
    } catch (e) {
      onError(e.toString());
    }
  }

  static Future<String> uploadPdf(File file) async {
    final req = http.MultipartRequest('POST', Uri.parse('$kHost/upload-pdf'));
    req.files.add(await http.MultipartFile.fromPath('file', file.path));
    final streamed = await req.send().timeout(const Duration(seconds: 60));
    final res = await http.Response.fromStream(streamed);
    final data = jsonDecode(res.body);
    if (data['status'] == 'success') return data['message'];
    throw Exception(data['message'] ?? 'Upload failed');
  }

  static Future<void> uploadText(String text) async {
    final res = await http.post(
      Uri.parse('$kHost/upload-text'),
      headers: {'Content-Type': 'application/json'},
      body: jsonEncode({'text': text}),
    ).timeout(const Duration(seconds: 30));
    if (res.statusCode != 200) throw Exception('Failed');
  }
}

10 — flutter

chat_screen.dart

The chat screen keeps a flat list of ChatMessage objects. When a new AI response starts, it appends an empty message to the list, then mutates its text field inside setState as each chunk arrives. No stream controllers, no providers — just reactive state mutation.

The animated dots widget (_Dots) is shown when a message's text is still empty. As soon as the first token arrives, it gets replaced by the actual text widget automatically.

screens/chat_screen.dart

import 'package:flutter/material.dart';
import '../../theme.dart';
import '../../services/api.dart';

class ChatMessage {
  final bool isUser;
  String text;
  ChatMessage({required this.isUser, this.text = ''});
}

class ChatScreen extends StatefulWidget {
  const ChatScreen({super.key});
  @override
  State<ChatScreen> createState() => _ChatScreenState();
}

class _ChatScreenState extends State<ChatScreen> {
  final _ctrl   = TextEditingController();
  final _scroll = ScrollController();
  final _msgs   = <ChatMessage>[];
  bool _busy    = false;

  void _scrollDown() => WidgetsBinding.instance.addPostFrameCallback((_) {
    if (_scroll.hasClients) {
      _scroll.animateTo(_scroll.position.maxScrollExtent,
        duration: const Duration(milliseconds: 250), curve: Curves.easeOut);
    }
  });

  void _send() {
    final text = _ctrl.text.trim();
    if (text.isEmpty || _busy) return;
    _ctrl.clear();
    final ai = ChatMessage(isUser: false);
    setState(() {
      _msgs.add(ChatMessage(isUser: true, text: text));
      _msgs.add(ai);
      _busy = true;
    });
    _scrollDown();
    Api.ask(
      question: text,
      onChunk: (c) => setState(() => ai.text += c),
      onDone:  () { setState(() => _busy = false); _scrollDown(); },
      onError: (e) => setState(() { ai.text = 'Error: $e'; _busy = false; }),
    );
  }

  @override
  Widget build(BuildContext context) {
    return Column(children: [
      _header(),
      Expanded(
        child: _msgs.isEmpty ? _empty() : ListView.builder(
          controller: _scroll,
          padding: const EdgeInsets.all(16),
          itemCount: _msgs.length,
          itemBuilder: (_, i) => _bubble(_msgs[i]),
        ),
      ),
      _input(),
    ]);
  }

  Widget _header() => Container(
    color: kSurface,
    padding: EdgeInsets.only(
      top: MediaQuery.of(context).padding.top + 10,
      left: 16, right: 16, bottom: 12,
    ),
    child: Row(children: [
      const Text('DocMind', style: TextStyle(fontSize: 18,
        fontWeight: FontWeight.w700, color: kTextPri, letterSpacing: -0.5)),
      const Spacer(),
      if (_busy) const SizedBox(width: 10, height: 10,
        child: CircularProgressIndicator(strokeWidth: 1.5, color: kAccent)),
    ]),
  );

  Widget _empty() => Center(
    child: Column(mainAxisSize: MainAxisSize.min, children: [
      const Icon(Icons.chat_bubble_outline, size: 40, color: kTextDim),
      const SizedBox(height: 12),
      const Text('Ask about your documents',
        style: TextStyle(color: kTextSec, fontSize: 14)),
      const SizedBox(height: 20),
      Wrap(spacing: 8, runSpacing: 8, alignment: WrapAlignment.center,
        children: ['Summarize this', 'Key points?', 'What is this about?']
          .map((s) => GestureDetector(
            onTap: () { _ctrl.text = s; _send(); },
            child: Container(
              padding: const EdgeInsets.symmetric(horizontal: 14, vertical: 7),
              decoration: BoxDecoration(
                border: Border.all(color: kBorder),
                borderRadius: BorderRadius.circular(20),
              ),
              child: Text(s, style: const TextStyle(fontSize: 12, color: kTextSec)),
            ),
          )).toList(),
      ),
    ]),
  );

  Widget _bubble(ChatMessage msg) => Align(
    alignment: msg.isUser ? Alignment.centerRight : Alignment.centerLeft,
    child: Container(
      margin: EdgeInsets.only(
        bottom: 8,
        left: msg.isUser ? 60 : 0,
        right: msg.isUser ? 0 : 60,
      ),
      padding: const EdgeInsets.symmetric(horizontal: 14, vertical: 10),
      decoration: BoxDecoration(
        color: msg.isUser ? kAccent.withOpacity(0.15) : kCard,
        borderRadius: BorderRadius.only(
          topLeft:     const Radius.circular(16),
          topRight:    const Radius.circular(16),
          bottomLeft:  Radius.circular(msg.isUser ? 16 : 4),
          bottomRight: Radius.circular(msg.isUser ? 4 : 16),
        ),
        border: Border.all(color: kBorder, width: 0.5),
      ),
      child: msg.text.isEmpty
        ? const _Dots()
        : Text(msg.text, style: const TextStyle(
            fontSize: 14, color: kTextPri, height: 1.55)),
    ),
  );

  Widget _input() => Container(
    color: kSurface,
    padding: EdgeInsets.fromLTRB(12, 10, 12,
      MediaQuery.of(context).viewInsets.bottom + 12),
    child: Row(children: [
      Expanded(
        child: TextField(
          controller: _ctrl,
          enabled: !_busy,
          minLines: 1, maxLines: 4,
          onSubmitted: (_) => _send(),
          style: const TextStyle(fontSize: 14, color: kTextPri),
          decoration: InputDecoration(
            hintText: _busy ? 'Thinking...' : 'Ask something...',
            hintStyle: const TextStyle(color: kTextSec, fontSize: 13),
            filled: true, fillColor: kCard,
            contentPadding: const EdgeInsets.symmetric(horizontal: 16, vertical: 10),
            border: OutlineInputBorder(
              borderRadius: BorderRadius.circular(24),
              borderSide: const BorderSide(color: kBorder, width: 0.5),
            ),
            enabledBorder: OutlineInputBorder(
              borderRadius: BorderRadius.circular(24),
              borderSide: const BorderSide(color: kBorder, width: 0.5),
            ),
            focusedBorder: OutlineInputBorder(
              borderRadius: BorderRadius.circular(24),
              borderSide: const BorderSide(color: kAccent, width: 1),
            ),
          ),
        ),
      ),
      const SizedBox(width: 8),
      GestureDetector(
        onTap: _busy ? null : _send,
        child: Container(
          width: 42, height: 42,
          decoration: BoxDecoration(
            color: _busy ? kCard : kAccent,
            shape: BoxShape.circle,
          ),
          child: Icon(Icons.arrow_upward_rounded,
            color: _busy ? kTextSec : kBg, size: 20),
        ),
      ),
    ]),
  );
}

// Animated typing indicator — shown while waiting for the first token
class _Dots extends StatefulWidget {
  const _Dots();
  @override
  State<_Dots> createState() => _DotsState();
}
class _DotsState extends State<_Dots> with SingleTickerProviderStateMixin {
  late final AnimationController _c =
    AnimationController(vsync: this,
      duration: const Duration(milliseconds: 800))..repeat();
  @override void dispose() { _c.dispose(); super.dispose(); }

  @override
  Widget build(BuildContext context) => AnimatedBuilder(
    animation: _c,
    builder: (_, __) => Row(mainAxisSize: MainAxisSize.min,
      children: List.generate(3, (i) {
        final t = (_c.value - i * 0.2).clamp(0.0, 1.0);
        final op = (0.3 + 0.7 * (t < 0.5 ? t * 2 : (1 - t) * 2)).clamp(0.3, 1.0);
        return Container(
          margin: const EdgeInsets.symmetric(horizontal: 2),
          width: 6, height: 6,
          decoration: BoxDecoration(
            color: kAccent.withOpacity(op),
            shape: BoxShape.circle,
          ),
        );
      }),
    ),
  );
}

11 — flutter

upload_screen.dart

Handles both PDF uploads via file_picker and raw text pastes. Each uploaded file gets added to a local _docs list so the UI shows what's been indexed. The _setStatus helper keeps the busy/error/success state in one place.

screens/upload_screen.dart

import 'dart:io';
import 'package:flutter/material.dart';
import 'package:file_picker/file_picker.dart';
import '../../theme.dart';
import '../../services/api.dart';

class UploadScreen extends StatefulWidget {
  const UploadScreen({super.key});
  @override
  State<UploadScreen> createState() => _UploadScreenState();
}

class _UploadScreenState extends State<UploadScreen> {
  final _textCtrl = TextEditingController();
  bool    _busy    = false;
  String? _status;
  bool    _isError = false;
  final   _docs    = <String>[];

  void _setStatus(String msg, {bool error = false}) =>
    setState(() { _status = msg; _isError = error; _busy = false; });

  Future<void> _pickPdf() async {
    final result = await FilePicker.pickFiles(
      type: FileType.custom,
      allowedExtensions: ['pdf'],
      allowMultiple: true,
    );
    if (result == null) return;
    setState(() { _busy = true; _status = null; });
    for (final f in result.files) {
      if (f.path == null) continue;
      try {
        await Api.uploadPdf(File(f.path!));
        setState(() => _docs.add(f.name));
        _setStatus('✓ ${f.name} indexed');
      } catch (e) {
        _setStatus('$e', error: true);
      }
    }
  }

  Future<void> _submitText() async {
    final text = _textCtrl.text.trim();
    if (text.isEmpty) return;
    setState(() { _busy = true; _status = null; });
    try {
      await Api.uploadText(text);
      _textCtrl.clear();
      setState(() => _docs.add('Text snippet'));
      _setStatus('✓ Text indexed');
    } catch (e) {
      _setStatus('$e', error: true);
    }
  }

  @override
  Widget build(BuildContext context) {
    return Column(children: [
      Container(
        color: kSurface,
        padding: EdgeInsets.only(
          top: MediaQuery.of(context).padding.top + 10,
          left: 16, right: 16, bottom: 12,
        ),
        child: Row(children: [
          const Text('Documents', style: TextStyle(fontSize: 18,
            fontWeight: FontWeight.w700, color: kTextPri, letterSpacing: -0.5)),
          const Spacer(),
          if (_docs.isNotEmpty)
            Text('${_docs.length} indexed',
              style: const TextStyle(fontSize: 12, color: kTextSec)),
        ]),
      ),
      Expanded(child: ListView(
        padding: const EdgeInsets.all(16),
        children: [
          // PDF drop zone
          GestureDetector(
            onTap: _busy ? null : _pickPdf,
            child: Container(
              height: 120,
              decoration: BoxDecoration(
                color: kCard,
                borderRadius: BorderRadius.circular(12),
                border: Border.all(color: _busy ? kAccent : kBorder),
              ),
              child: _busy
                ? const Center(child: CircularProgressIndicator(
                    strokeWidth: 2, color: kAccent))
                : Column(mainAxisAlignment: MainAxisAlignment.center, children: [
                    Icon(Icons.upload_file_outlined, color: kAccent, size: 28),
                    const SizedBox(height: 8),
                    const Text('Tap to upload PDF', style: TextStyle(
                      color: kTextPri, fontSize: 14, fontWeight: FontWeight.w500)),
                    const SizedBox(height: 2),
                    const Text('Multiple files supported',
                      style: TextStyle(color: kTextSec, fontSize: 12)),
                  ]),
            ),
          ),
          const SizedBox(height: 16),
          // Text input
          Container(
            decoration: BoxDecoration(
              color: kCard,
              borderRadius: BorderRadius.circular(12),
              border: Border.all(color: kBorder),
            ),
            child: Column(children: [
              TextField(
                controller: _textCtrl,
                maxLines: 4, minLines: 3,
                style: const TextStyle(fontSize: 13, color: kTextPri, height: 1.5),
                decoration: const InputDecoration(
                  hintText: 'Or paste text here...',
                  hintStyle: TextStyle(color: kTextSec, fontSize: 13),
                  border: InputBorder.none,
                  contentPadding: EdgeInsets.all(14),
                ),
              ),
              Divider(height: 1, color: kBorder),
              Align(
                alignment: Alignment.centerRight,
                child: TextButton(
                  onPressed: _busy ? null : _submitText,
                  child: const Text('Add text',
                    style: TextStyle(color: kAccent, fontSize: 13)),
                ),
              ),
            ]),
          ),
          // Status banner
          if (_status != null) ...[
            const SizedBox(height: 12),
            Container(
              padding: const EdgeInsets.symmetric(horizontal: 12, vertical: 10),
              decoration: BoxDecoration(
                color: (_isError ? kRed : kGreen).withOpacity(0.1),
                borderRadius: BorderRadius.circular(8),
                border: Border.all(
                  color: (_isError ? kRed : kGreen).withOpacity(0.3)),
              ),
              child: Row(children: [
                Icon(_isError ? Icons.error_outline : Icons.check_circle_outline,
                  size: 14, color: _isError ? kRed : kGreen),
                const SizedBox(width: 8),
                Expanded(child: Text(_status!,
                  style: TextStyle(fontSize: 13,
                    color: _isError ? kRed : kGreen))),
              ]),
            ),
          ],
          // Indexed docs list
          if (_docs.isNotEmpty) ...[
            const SizedBox(height: 20),
            const Text('INDEXED', style: TextStyle(fontSize: 10,
              fontWeight: FontWeight.w700, color: kTextSec, letterSpacing: 1)),
            const SizedBox(height: 8),
            ..._docs.map((name) => Container(
              margin: const EdgeInsets.only(bottom: 6),
              padding: const EdgeInsets.symmetric(horizontal: 12, vertical: 10),
              decoration: BoxDecoration(
                color: kCard,
                borderRadius: BorderRadius.circular(8),
                border: Border.all(color: kBorder, width: 0.5),
              ),
              child: Row(children: [
                const Icon(Icons.description_outlined, size: 14, color: kAccent),
                const SizedBox(width: 10),
                Expanded(child: Text(name,
                  style: const TextStyle(fontSize: 13, color: kTextPri))),
                const Icon(Icons.check, size: 14, color: kGreen),
              ]),
            )),
          ],
        ],
      )),
    ]);
  }
}

12 — run it

Running the whole thing

Three processes need to be running concurrently: Ollama, the Python server, and the Flutter app.

01Start Ollama. Usually runs automatically after install. If not: ollama serve
02Start the backend. From inside the backend/ folder: uvicorn app:app --host 0.0.0.0 --port 8000
03Set your IP. If running on a physical device, open theme.dart and update kHost to your machine's local IP address.
04Run Flutter. flutter run from inside docmind/. You should see the empty chat screen.
05Test with no documents. Ask anything. You should see "not found in documents". That's correct — the vector store is empty.
06Upload a PDF. Go to the Documents tab, pick a file, wait for the indexed confirmation, then ask questions about its contents.

First run is slow. The SentenceTransformer model (~80 MB) downloads on first use, and Ollama loads Gemma3 into memory. Both are cached — subsequent starts are much faster.

Offline AI Is Actually Insane | Build a Local RAG App Fast | flutter python ollama

Build a local AI chat app
that runs on your laptop.

What is RAG, and why does it work?

Ollama — a local model server

How the project is laid out

vectorstore.py — where your documents live

Why 384 dimensions?

ingest.py — preparing documents for search

Chunking — and why the overlap matters

app.py — the FastAPI server

How streaming works

requirements.txt

Flutter app — two screens, one API class

theme.dart — colors and the host URL

api.dart — all HTTP calls in one place

chat_screen.dart

upload_screen.dart

Running the whole thing

Post a Comment

This month popular

Contact Form

Offline AI Is Actually Insane | Build a Local RAG App Fast | flutter python ollama

Build a local AI chat appthat runs on your laptop.

What is RAG, and why does it work?

Ollama — a local model server

How the project is laid out

vectorstore.py — where your documents live

Why 384 dimensions?

ingest.py — preparing documents for search

Chunking — and why the overlap matters

app.py — the FastAPI server

How streaming works

requirements.txt

Flutter app — two screens, one API class

theme.dart — colors and the host URL

api.dart — all HTTP calls in one place

chat_screen.dart

upload_screen.dart

Running the whole thing

Post a Comment

Contact Form

Build a local AI chat app
that runs on your laptop.