Build a local AI chat app
that runs on your laptop.
No API key. No cloud. No monthly bill. Just a Python backend, a Flutter app, and an open-source model running entirely on your machine. This is a complete walkthrough — every file, every decision.
Watch first
4-minute overview on YouTube
The video gives a quick overview. This post goes deeper — full source code, explanations of every design decision, and things the video skips.
In this post
01 — concept
What is RAG, and why does it work?
RAG stands for Retrieval-Augmented Generation. Instead of asking an AI "what do you know about X?", you first pull the relevant text out of your own documents, hand it to the model as context, and then ask "based on this, what's the answer to X?"
The model never needs to be trained on your documents. It just reads what you give it in the prompt — the same way you'd paste an article into ChatGPT before asking a question. The difference here is that the retrieval step is automatic, semantic, and fast.
"Semantic search" means we're not matching keywords — we're matching meaning. The word "automobile" and "car" will match the same documents even though they share no letters. That's what the embedding model gives us.
Why local? Your documents never leave your machine. No data goes to OpenAI, Anthropic, or anyone else. For PDFs with sensitive content — contracts, personal notes, work docs — that matters a lot.
The backend is three Python files that each do one thing:
- reads PDF or raw text
- splits into chunks
- embeds each chunk
- saves to vectorstore
- FAISS index (dim 384)
- add vectors
- search by similarity
- persist to disk
- FastAPI server
- /upload-pdf, /upload-text
- /ask with streaming
- calls local Ollama
02 — setup
Ollama — a local model server
Ollama is a tool that lets you run open-source LLMs locally. Think of it as a local inference server — once running, it exposes an HTTP API at localhost:11434 that any app can talk to.
Go to ollama.com, download the installer, run it. Then verify:
ollama --version
We're using Gemma 3 4B — Google's open model. Small enough to run on most laptops (needs ~5 GB RAM), fast enough for real-time Q&A.
# pull the model — ~2.5 GB download, cached after
ollama pull gemma3:4b
# sanity check — chat with it before wiring up the app
ollama run gemma3
Tight on RAM? Try gemma3:1b instead. It's smaller and faster. Change the LLM constant in app.py to match.
Once downloaded, Ollama runs as a background service. The Python backend calls http://127.0.0.1:11434/api/generate directly — no extra setup needed.
03 — project
How the project is laid out
The backend and Flutter app are completely independent. The Flutter app is just a client making HTTP requests — you could swap it for a web app or CLI and the backend wouldn't change at all.
04 — python backend
vectorstore.py — where your documents live
Every document you upload eventually becomes a list of 384 numbers — a vector that encodes its meaning. This file stores those vectors and lets you search through them using FAISS.
FAISS (built by Meta) is a library built for fast nearest-neighbour search over large sets of vectors. When you ask a question, we convert it to a vector too, then ask FAISS: "find me the 3 stored vectors closest to this one." The matching chunks become the context we pass to the LLM.
Why 384 dimensions?
That's the output size of all-MiniLM-L6-v2, the embedding model we're using. Every vector it produces is exactly 384 numbers. FAISS needs to know this upfront to set up the index — hence DIM = 384.
import faiss
import numpy as np
import json
import os
DIM = 384
index = faiss.IndexFlatL2(DIM)
metadata = []
def save():
faiss.write_index(index, "index.faiss")
with open("meta.json", "w") as f:
json.dump(metadata, f)
def load():
global index, metadata
if os.path.exists("index.faiss") and os.path.getsize("index.faiss") > 0:
try:
index = faiss.read_index("index.faiss")
except:
index = faiss.IndexFlatL2(DIM)
if os.path.exists("meta.json") and os.path.getsize("meta.json") > 0:
with open("meta.json", "r") as f:
metadata = json.load(f)
def add(vector, meta):
vector = np.array([vector]).astype("float32")
index.add(vector)
metadata.append(meta)
def search(vector, k=3):
vector = np.array([vector]).astype("float32")
k = min(k, len(metadata))
if k == 0:
return []
D, I = index.search(vector, k)
return [metadata[i] for i in I[0] if 0 <= i < len(metadata)]
- save()Writes the FAISS index to
index.faissand all the original text chunks tometa.json. Called at the end of every ingest operation so nothing is lost between server restarts. - load()Reads both files back on startup. If they don't exist yet, it initializes a fresh empty index. The size check avoids a crash on zero-byte files.
- add()Inserts a single vector into FAISS and appends the corresponding metadata dict — which holds the original text and source filename — to the Python list.
- search()Finds the k nearest vectors using L2 distance. Returns the metadata dicts containing the raw text, which become the context for the LLM prompt. The bounds check on
I[0]prevents index-out-of-range errors.
05 — python backend
ingest.py — preparing documents for search
This file does the preprocessing work. It takes a raw document — a PDF or a block of text — and turns it into a set of searchable vectors stored in FAISS. Two things happen here: chunking and embedding.
Chunking — and why the overlap matters
You can't embed an entire document as one vector. A 50-page PDF embedded as a single vector would be useless for retrieval — you'd get the whole document back regardless of what you asked. So we split the text into 500-character chunks.
The 100-character overlap is the detail most people miss. Without it, a sentence that falls right on a chunk boundary gets cut in half, and neither half makes sense in isolation. The overlap ensures every sentence appears complete in at least one chunk.
from sentence_transformers import SentenceTransformer
from pypdf import PdfReader
import vectorstore as vs
model = SentenceTransformer("all-MiniLM-L6-v2")
def chunk_text(text, size=500, overlap=100):
chunks = []
start = 0
while start < len(text):
end = start + size
chunk = text[start:end]
chunks.append(chunk)
start += size - overlap
return chunks
def add_text(text, source="manual"):
chunks = chunk_text(text)
for c in chunks:
emb = model.encode(c).astype("float32")
vs.add(emb, {"text": c, "source": source})
vs.save()
def add_pdf(path):
reader = PdfReader(path)
text = ""
for page in reader.pages:
text += page.extract_text()
add_text(text, source=path)
The embedding model: all-MiniLM-L6-v2 is about 80 MB and downloads automatically on first run. It runs CPU-only just fine and is accurate enough for document Q&A. Each call to model.encode() returns a numpy array of shape (384,).
06 — python backend
app.py — the FastAPI server
This is the server the Flutter app talks to. It wires everything together: receives uploads, triggers ingestion, handles questions, and streams answers back token by token from Ollama.
How streaming works
FastAPI's StreamingResponse lets a generator function yield data as it becomes available. Ollama's API supports streaming too — it sends back one JSON object per token. We parse each line, pull out the "response" field, and yield it immediately. The Flutter app receives these tokens as bytes over a persistent HTTP connection, decodes them, and appends each one to the message in real time.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi import UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
import os, json, requests
from sentence_transformers import SentenceTransformer
import ingest
import vectorstore as vs
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], allow_credentials=True,
allow_methods=["*"], allow_headers=["*"],
)
model = SentenceTransformer("all-MiniLM-L6-v2")
OLLAMA_URL = "http://127.0.0.1:11434/api/generate"
LLM = "gemma3:4b"
os.makedirs("docs", exist_ok=True)
@app.on_event("startup")
def startup():
vs.load() # reload persisted index on every restart
@app.post("/upload-pdf")
async def upload_pdf(file: UploadFile = File(...)):
file_path = f"docs/{file.filename}"
with open(file_path, "wb") as f:
content = await file.read()
f.write(content)
try:
ingest.add_pdf(file_path)
return {"status": "success", "message": f"{file.filename} processed"}
except Exception as e:
return {"status": "error", "message": str(e)}
@app.post("/upload-text")
def upload_text(data: dict):
ingest.add_text(data["text"])
return {"status": "stored"}
@app.post("/ask")
def ask(data: dict):
question = data["question"]
# 1. embed the question
q_emb = model.encode(question).astype("float32")
# 2. find the 3 closest chunks in FAISS
results = vs.search(q_emb, k=3)
context = "\n\n".join([r["text"] for r in results])
# 3. build a strict context-only prompt
prompt = f"""Use ONLY this context to answer.
Context:
{context}
Question:
{question}
If the answer is not in the context, say "not found in documents"."""
# 4. stream tokens from Ollama
def stream():
response = requests.post(
OLLAMA_URL,
json={"model": LLM, "prompt": prompt, "stream": True},
stream=True,
)
for line in response.iter_lines():
if line:
try:
data = json.loads(line.decode("utf-8"))
if "response" in data:
yield data["response"]
except:
pass
return StreamingResponse(stream(), media_type="text/plain")
- 1Encode the user's question into a 384-dim vector using the same SentenceTransformer used during ingestion. This is key — both documents and queries must be embedded with the same model.
- 2Search FAISS for the 3 most semantically similar chunks across all uploaded documents.
- 3Concatenate the chunks into a
contextstring and inject it into the prompt. The prompt explicitly instructs the model to answer only from the context — this is what keeps it grounded and prevents hallucination. - 4POST to Ollama with
stream: True, iterate over the response line by line, and yield each token immediately viaStreamingResponse.
requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
requests==2.31.0
sentence-transformers==5.5.1
pypdf==3.17.1
faiss-cpu==1.7.4
numpy==1.24.3
python-multipart==0.0.20
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
07 — flutter
Flutter app — two screens, one API class
The Flutter side is deliberately lean. No state management library, no generated code, no complex architecture. Just a StatefulWidget per screen, a static Api class for HTTP calls, and a theme file for shared constants.
dependencies:
flutter:
sdk: flutter
http: ^1.6.0
file_picker: ^11.0.2
cupertino_icons: ^1.0.8
import 'package:flutter/material.dart';
import 'package:flutter/services.dart';
import 'theme.dart';
import 'screens/chat_screen.dart';
import 'screens/upload_screen.dart';
void main() {
WidgetsFlutterBinding.ensureInitialized();
SystemChrome.setSystemUIOverlayStyle(const SystemUiOverlayStyle(
statusBarColor: Colors.transparent,
statusBarIconBrightness: Brightness.light,
));
runApp(const App());
}
class App extends StatelessWidget {
const App({super.key});
@override
Widget build(BuildContext context) => MaterialApp(
title: 'DocMind',
debugShowCheckedModeBanner: false,
theme: ThemeData(
brightness: Brightness.dark,
scaffoldBackgroundColor: kBg,
colorScheme: const ColorScheme.dark(primary: kAccent, surface: kSurface),
useMaterial3: true,
),
home: const Home(),
);
}
class Home extends StatefulWidget {
const Home({super.key});
@override
State<Home> createState() => _HomeState();
}
class _HomeState extends State<Home> {
int _tab = 0;
@override
Widget build(BuildContext context) => Scaffold(
backgroundColor: kBg,
body: IndexedStack(
index: _tab,
children: const [ChatScreen(), UploadScreen()],
),
bottomNavigationBar: NavigationBar(
backgroundColor: kSurface,
indicatorColor: kAccent.withOpacity(0.15),
selectedIndex: _tab,
onDestinationSelected: (i) => setState(() => _tab = i),
destinations: const [
NavigationDestination(
icon: Icon(Icons.chat_bubble_outline_rounded, color: kTextSec),
selectedIcon: Icon(Icons.chat_bubble_rounded, color: kAccent),
label: 'Chat',
),
NavigationDestination(
icon: Icon(Icons.folder_outlined, color: kTextSec),
selectedIcon: Icon(Icons.folder_rounded, color: kAccent),
label: 'Documents',
),
],
),
);
}
08 — flutter
theme.dart — colors and the host URL
All shared constants live here. The only thing you need to change before running is kHost.
Physical device? Replace kHost with your computer's local IP — something like http://192.168.1.42:8000. Find it with ifconfig on Mac/Linux or ipconfig on Windows. For the Android emulator, use http://10.0.2.2:8000.
import 'package:flutter/material.dart';
const kBg = Color(0xFF0D1117);
const kSurface = Color(0xFF161B22);
const kCard = Color(0xFF1C2333);
const kBorder = Color(0xFF30363D);
const kAccent = Color(0xFF58A6FF);
const kGreen = Color(0xFF3FB950);
const kRed = Color(0xFFF85149);
const kTextPri = Color(0xFFE6EDF3);
const kTextSec = Color(0xFF8B949E);
const kTextDim = Color(0xFF30363D);
// Change to your machine's local IP when using a physical device
const kHost = 'http://192.168.1.10:8000';
// const kHost = 'http://10.0.2.2:8000'; // Android emulator
09 — flutter
api.dart — all HTTP calls in one place
One static class, three methods. The streaming one is the interesting bit: instead of waiting for the full response with http.post(), it uses http.Request and req.send() to get a streamed response, then listens to the byte stream and fires onChunk for every chunk of bytes that arrives. That callback drives the live typing animation in the chat screen.
import 'dart:convert';
import 'dart:io';
import 'package:http/http.dart' as http;
import '../theme.dart';
class Api {
static Future<void> ask({
required String question,
required void Function(String) onChunk,
required void Function() onDone,
required void Function(String) onError,
}) async {
try {
final req = http.Request('POST', Uri.parse('$kHost/ask'));
req.headers['Content-Type'] = 'application/json';
req.body = jsonEncode({'question': question});
final res = await req.send().timeout(const Duration(seconds: 120));
res.stream.listen(
(b) => onChunk(utf8.decode(b)),
onDone: onDone,
onError: (e) => onError(e.toString()),
);
} catch (e) {
onError(e.toString());
}
}
static Future<String> uploadPdf(File file) async {
final req = http.MultipartRequest('POST', Uri.parse('$kHost/upload-pdf'));
req.files.add(await http.MultipartFile.fromPath('file', file.path));
final streamed = await req.send().timeout(const Duration(seconds: 60));
final res = await http.Response.fromStream(streamed);
final data = jsonDecode(res.body);
if (data['status'] == 'success') return data['message'];
throw Exception(data['message'] ?? 'Upload failed');
}
static Future<void> uploadText(String text) async {
final res = await http.post(
Uri.parse('$kHost/upload-text'),
headers: {'Content-Type': 'application/json'},
body: jsonEncode({'text': text}),
).timeout(const Duration(seconds: 30));
if (res.statusCode != 200) throw Exception('Failed');
}
}
10 — flutter
chat_screen.dart
The chat screen keeps a flat list of ChatMessage objects. When a new AI response starts, it appends an empty message to the list, then mutates its text field inside setState as each chunk arrives. No stream controllers, no providers — just reactive state mutation.
The animated dots widget (_Dots) is shown when a message's text is still empty. As soon as the first token arrives, it gets replaced by the actual text widget automatically.
import 'package:flutter/material.dart';
import '../../theme.dart';
import '../../services/api.dart';
class ChatMessage {
final bool isUser;
String text;
ChatMessage({required this.isUser, this.text = ''});
}
class ChatScreen extends StatefulWidget {
const ChatScreen({super.key});
@override
State<ChatScreen> createState() => _ChatScreenState();
}
class _ChatScreenState extends State<ChatScreen> {
final _ctrl = TextEditingController();
final _scroll = ScrollController();
final _msgs = <ChatMessage>[];
bool _busy = false;
void _scrollDown() => WidgetsBinding.instance.addPostFrameCallback((_) {
if (_scroll.hasClients) {
_scroll.animateTo(_scroll.position.maxScrollExtent,
duration: const Duration(milliseconds: 250), curve: Curves.easeOut);
}
});
void _send() {
final text = _ctrl.text.trim();
if (text.isEmpty || _busy) return;
_ctrl.clear();
final ai = ChatMessage(isUser: false);
setState(() {
_msgs.add(ChatMessage(isUser: true, text: text));
_msgs.add(ai);
_busy = true;
});
_scrollDown();
Api.ask(
question: text,
onChunk: (c) => setState(() => ai.text += c),
onDone: () { setState(() => _busy = false); _scrollDown(); },
onError: (e) => setState(() { ai.text = 'Error: $e'; _busy = false; }),
);
}
@override
Widget build(BuildContext context) {
return Column(children: [
_header(),
Expanded(
child: _msgs.isEmpty ? _empty() : ListView.builder(
controller: _scroll,
padding: const EdgeInsets.all(16),
itemCount: _msgs.length,
itemBuilder: (_, i) => _bubble(_msgs[i]),
),
),
_input(),
]);
}
Widget _header() => Container(
color: kSurface,
padding: EdgeInsets.only(
top: MediaQuery.of(context).padding.top + 10,
left: 16, right: 16, bottom: 12,
),
child: Row(children: [
const Text('DocMind', style: TextStyle(fontSize: 18,
fontWeight: FontWeight.w700, color: kTextPri, letterSpacing: -0.5)),
const Spacer(),
if (_busy) const SizedBox(width: 10, height: 10,
child: CircularProgressIndicator(strokeWidth: 1.5, color: kAccent)),
]),
);
Widget _empty() => Center(
child: Column(mainAxisSize: MainAxisSize.min, children: [
const Icon(Icons.chat_bubble_outline, size: 40, color: kTextDim),
const SizedBox(height: 12),
const Text('Ask about your documents',
style: TextStyle(color: kTextSec, fontSize: 14)),
const SizedBox(height: 20),
Wrap(spacing: 8, runSpacing: 8, alignment: WrapAlignment.center,
children: ['Summarize this', 'Key points?', 'What is this about?']
.map((s) => GestureDetector(
onTap: () { _ctrl.text = s; _send(); },
child: Container(
padding: const EdgeInsets.symmetric(horizontal: 14, vertical: 7),
decoration: BoxDecoration(
border: Border.all(color: kBorder),
borderRadius: BorderRadius.circular(20),
),
child: Text(s, style: const TextStyle(fontSize: 12, color: kTextSec)),
),
)).toList(),
),
]),
);
Widget _bubble(ChatMessage msg) => Align(
alignment: msg.isUser ? Alignment.centerRight : Alignment.centerLeft,
child: Container(
margin: EdgeInsets.only(
bottom: 8,
left: msg.isUser ? 60 : 0,
right: msg.isUser ? 0 : 60,
),
padding: const EdgeInsets.symmetric(horizontal: 14, vertical: 10),
decoration: BoxDecoration(
color: msg.isUser ? kAccent.withOpacity(0.15) : kCard,
borderRadius: BorderRadius.only(
topLeft: const Radius.circular(16),
topRight: const Radius.circular(16),
bottomLeft: Radius.circular(msg.isUser ? 16 : 4),
bottomRight: Radius.circular(msg.isUser ? 4 : 16),
),
border: Border.all(color: kBorder, width: 0.5),
),
child: msg.text.isEmpty
? const _Dots()
: Text(msg.text, style: const TextStyle(
fontSize: 14, color: kTextPri, height: 1.55)),
),
);
Widget _input() => Container(
color: kSurface,
padding: EdgeInsets.fromLTRB(12, 10, 12,
MediaQuery.of(context).viewInsets.bottom + 12),
child: Row(children: [
Expanded(
child: TextField(
controller: _ctrl,
enabled: !_busy,
minLines: 1, maxLines: 4,
onSubmitted: (_) => _send(),
style: const TextStyle(fontSize: 14, color: kTextPri),
decoration: InputDecoration(
hintText: _busy ? 'Thinking...' : 'Ask something...',
hintStyle: const TextStyle(color: kTextSec, fontSize: 13),
filled: true, fillColor: kCard,
contentPadding: const EdgeInsets.symmetric(horizontal: 16, vertical: 10),
border: OutlineInputBorder(
borderRadius: BorderRadius.circular(24),
borderSide: const BorderSide(color: kBorder, width: 0.5),
),
enabledBorder: OutlineInputBorder(
borderRadius: BorderRadius.circular(24),
borderSide: const BorderSide(color: kBorder, width: 0.5),
),
focusedBorder: OutlineInputBorder(
borderRadius: BorderRadius.circular(24),
borderSide: const BorderSide(color: kAccent, width: 1),
),
),
),
),
const SizedBox(width: 8),
GestureDetector(
onTap: _busy ? null : _send,
child: Container(
width: 42, height: 42,
decoration: BoxDecoration(
color: _busy ? kCard : kAccent,
shape: BoxShape.circle,
),
child: Icon(Icons.arrow_upward_rounded,
color: _busy ? kTextSec : kBg, size: 20),
),
),
]),
);
}
// Animated typing indicator — shown while waiting for the first token
class _Dots extends StatefulWidget {
const _Dots();
@override
State<_Dots> createState() => _DotsState();
}
class _DotsState extends State<_Dots> with SingleTickerProviderStateMixin {
late final AnimationController _c =
AnimationController(vsync: this,
duration: const Duration(milliseconds: 800))..repeat();
@override void dispose() { _c.dispose(); super.dispose(); }
@override
Widget build(BuildContext context) => AnimatedBuilder(
animation: _c,
builder: (_, __) => Row(mainAxisSize: MainAxisSize.min,
children: List.generate(3, (i) {
final t = (_c.value - i * 0.2).clamp(0.0, 1.0);
final op = (0.3 + 0.7 * (t < 0.5 ? t * 2 : (1 - t) * 2)).clamp(0.3, 1.0);
return Container(
margin: const EdgeInsets.symmetric(horizontal: 2),
width: 6, height: 6,
decoration: BoxDecoration(
color: kAccent.withOpacity(op),
shape: BoxShape.circle,
),
);
}),
),
);
}
11 — flutter
upload_screen.dart
Handles both PDF uploads via file_picker and raw text pastes. Each uploaded file gets added to a local _docs list so the UI shows what's been indexed. The _setStatus helper keeps the busy/error/success state in one place.
import 'dart:io';
import 'package:flutter/material.dart';
import 'package:file_picker/file_picker.dart';
import '../../theme.dart';
import '../../services/api.dart';
class UploadScreen extends StatefulWidget {
const UploadScreen({super.key});
@override
State<UploadScreen> createState() => _UploadScreenState();
}
class _UploadScreenState extends State<UploadScreen> {
final _textCtrl = TextEditingController();
bool _busy = false;
String? _status;
bool _isError = false;
final _docs = <String>[];
void _setStatus(String msg, {bool error = false}) =>
setState(() { _status = msg; _isError = error; _busy = false; });
Future<void> _pickPdf() async {
final result = await FilePicker.pickFiles(
type: FileType.custom,
allowedExtensions: ['pdf'],
allowMultiple: true,
);
if (result == null) return;
setState(() { _busy = true; _status = null; });
for (final f in result.files) {
if (f.path == null) continue;
try {
await Api.uploadPdf(File(f.path!));
setState(() => _docs.add(f.name));
_setStatus('✓ ${f.name} indexed');
} catch (e) {
_setStatus('$e', error: true);
}
}
}
Future<void> _submitText() async {
final text = _textCtrl.text.trim();
if (text.isEmpty) return;
setState(() { _busy = true; _status = null; });
try {
await Api.uploadText(text);
_textCtrl.clear();
setState(() => _docs.add('Text snippet'));
_setStatus('✓ Text indexed');
} catch (e) {
_setStatus('$e', error: true);
}
}
@override
Widget build(BuildContext context) {
return Column(children: [
Container(
color: kSurface,
padding: EdgeInsets.only(
top: MediaQuery.of(context).padding.top + 10,
left: 16, right: 16, bottom: 12,
),
child: Row(children: [
const Text('Documents', style: TextStyle(fontSize: 18,
fontWeight: FontWeight.w700, color: kTextPri, letterSpacing: -0.5)),
const Spacer(),
if (_docs.isNotEmpty)
Text('${_docs.length} indexed',
style: const TextStyle(fontSize: 12, color: kTextSec)),
]),
),
Expanded(child: ListView(
padding: const EdgeInsets.all(16),
children: [
// PDF drop zone
GestureDetector(
onTap: _busy ? null : _pickPdf,
child: Container(
height: 120,
decoration: BoxDecoration(
color: kCard,
borderRadius: BorderRadius.circular(12),
border: Border.all(color: _busy ? kAccent : kBorder),
),
child: _busy
? const Center(child: CircularProgressIndicator(
strokeWidth: 2, color: kAccent))
: Column(mainAxisAlignment: MainAxisAlignment.center, children: [
Icon(Icons.upload_file_outlined, color: kAccent, size: 28),
const SizedBox(height: 8),
const Text('Tap to upload PDF', style: TextStyle(
color: kTextPri, fontSize: 14, fontWeight: FontWeight.w500)),
const SizedBox(height: 2),
const Text('Multiple files supported',
style: TextStyle(color: kTextSec, fontSize: 12)),
]),
),
),
const SizedBox(height: 16),
// Text input
Container(
decoration: BoxDecoration(
color: kCard,
borderRadius: BorderRadius.circular(12),
border: Border.all(color: kBorder),
),
child: Column(children: [
TextField(
controller: _textCtrl,
maxLines: 4, minLines: 3,
style: const TextStyle(fontSize: 13, color: kTextPri, height: 1.5),
decoration: const InputDecoration(
hintText: 'Or paste text here...',
hintStyle: TextStyle(color: kTextSec, fontSize: 13),
border: InputBorder.none,
contentPadding: EdgeInsets.all(14),
),
),
Divider(height: 1, color: kBorder),
Align(
alignment: Alignment.centerRight,
child: TextButton(
onPressed: _busy ? null : _submitText,
child: const Text('Add text',
style: TextStyle(color: kAccent, fontSize: 13)),
),
),
]),
),
// Status banner
if (_status != null) ...[
const SizedBox(height: 12),
Container(
padding: const EdgeInsets.symmetric(horizontal: 12, vertical: 10),
decoration: BoxDecoration(
color: (_isError ? kRed : kGreen).withOpacity(0.1),
borderRadius: BorderRadius.circular(8),
border: Border.all(
color: (_isError ? kRed : kGreen).withOpacity(0.3)),
),
child: Row(children: [
Icon(_isError ? Icons.error_outline : Icons.check_circle_outline,
size: 14, color: _isError ? kRed : kGreen),
const SizedBox(width: 8),
Expanded(child: Text(_status!,
style: TextStyle(fontSize: 13,
color: _isError ? kRed : kGreen))),
]),
),
],
// Indexed docs list
if (_docs.isNotEmpty) ...[
const SizedBox(height: 20),
const Text('INDEXED', style: TextStyle(fontSize: 10,
fontWeight: FontWeight.w700, color: kTextSec, letterSpacing: 1)),
const SizedBox(height: 8),
..._docs.map((name) => Container(
margin: const EdgeInsets.only(bottom: 6),
padding: const EdgeInsets.symmetric(horizontal: 12, vertical: 10),
decoration: BoxDecoration(
color: kCard,
borderRadius: BorderRadius.circular(8),
border: Border.all(color: kBorder, width: 0.5),
),
child: Row(children: [
const Icon(Icons.description_outlined, size: 14, color: kAccent),
const SizedBox(width: 10),
Expanded(child: Text(name,
style: const TextStyle(fontSize: 13, color: kTextPri))),
const Icon(Icons.check, size: 14, color: kGreen),
]),
)),
],
],
)),
]);
}
}
12 — run it
Running the whole thing
Three processes need to be running concurrently: Ollama, the Python server, and the Flutter app.
- 01Start Ollama. Usually runs automatically after install. If not:
ollama serve - 02Start the backend. From inside the
backend/folder:uvicorn app:app --host 0.0.0.0 --port 8000 - 03Set your IP. If running on a physical device, open
theme.dartand updatekHostto your machine's local IP address. - 04Run Flutter.
flutter runfrom insidedocmind/. You should see the empty chat screen. - 05Test with no documents. Ask anything. You should see "not found in documents". That's correct — the vector store is empty.
- 06Upload a PDF. Go to the Documents tab, pick a file, wait for the indexed confirmation, then ask questions about its contents.
First run is slow. The SentenceTransformer model (~80 MB) downloads on first use, and Ollama loads Gemma3 into memory. Both are cached — subsequent starts are much faster.