Ollama Setup

Run AI models locally on your own hardware.

Used in: Arion Cadenzium
About 10–20 minutes (depending on model download sizes)

? What is Ollama?

Ollama is a free, open-source application that runs large language models directly on your computer. Your documents, questions, and AI responses never leave your machine — Ollama is the engine that makes private AI possible in Chordalia products.

Two kinds of models are used by Chordalia apps:

Some products also use vision models that can describe images (e.g. gemma3:12b, llava).

How much will this cost? Ollama itself is free. Running it on your own hardware means you pay only for electricity. No subscriptions, no API fees.

1 Check your hardware

Ollama runs on most modern computers, but larger models need more memory (RAM). You don't need a graphics card to run Ollama — but if you have one, Ollama will use it automatically to run models much faster.

Recommended minimums

Not sure how much RAM you have? On Windows, press Ctrl+Shift+Esc to open Task Manager, then click the Performance tab → Memory. The total is shown in the upper right.

2 Install Ollama

Download the installer from the official site:

https://ollama.com/download →

Choose your operating system (Windows, macOS, or Linux) and run the installer. On Windows, the installer registers Ollama as a startup app, so it launches in the background when you log in — you'll see an Ollama icon in the system tray (lower-right corner near the clock). It runs in your user session, not as a Windows service, so logging out will stop it; it'll start again the next time you log in. On macOS, Ollama appears in the menu bar with the same login-time autostart behaviour.

Ollama download page with download button for Windows, macOS, and Linux

Verify it's running

After installing, open a Command Prompt or terminal and run:

ollama --version

You should see something like ollama version is 0.12.0. If you get "command not found", close and reopen the terminal so it picks up the new PATH entry, then try again.

3 Download a chat model

Ollama doesn't come with any models pre-installed — you choose and download them yourself. For general-purpose Q&A over your documents, llama3.1:8b is a solid starting point: fast on 16 GB RAM, good answer quality, 4.7 GB download.

ollama pull llama3.1:8b

The first pull of a model downloads it to your computer (this can take anywhere from 2 to 20 minutes depending on your internet connection). Subsequent uses are instant.

Terminal showing `ollama pull` progress with the download bar

Other popular chat models

Browse the full catalog at ollama.com/library →

4 Download an embedding model

Embedding models are much smaller than chat models (typically under 300 MB). They convert text into numeric representations that let the app find documents by meaning instead of by exact word matches.

The standard choice is nomic-embed-text:

ollama pull nomic-embed-text
Only needed for Arion (Tier 3). Cadenzium uses Ollama for text correction only, so you can skip this step if you're not setting up Arion.

5 Download a vision model (optional)

Vision models look at images (photographs, screenshots, scanned pages) and describe what they see so that images become searchable by content. For example, searching for "birthday cake" finds photos that contain one, even if no filename mentions it.

gemma3:12b is a good default — 7 GB, handles both text and image prompts:

ollama pull gemma3:12b
Only useful for Arion. Vision description is an optional Arion feature (see Settings → Vision). Cadenzium has its own OCR pipeline and doesn't currently use Ollama vision models.

Alternatives

6 Verify everything is working

List the models you've downloaded:

ollama list

You should see all the models you pulled, along with their sizes:

NAME ID SIZE MODIFIED llama3.1:8b f66fc8dc39ea 4.7 GB 2 minutes ago nomic-embed-text 0a109f422b47 274 MB 30 seconds ago gemma3:12b f4031aab637d 7.0 GB 5 minutes ago

Try a quick test of the chat model from the command line:

ollama run llama3.1:8b "What is 2 + 2?"

If the model answers (correctly!), Ollama is ready for Chordalia apps to use.

Using Ollama in your Chordalia product

Arion — Tier 3 (Private AI)

Open Settings → Ollama (Tier 3). Set the host to http://localhost:11434, then click Refresh on the Chat Model and Embedding Model dropdowns — Arion queries Ollama and lists your installed models. Pick llama3.1:8b (or your preferred chat model) and nomic-embed-text.

For vision: Settings → Vision → tick Enable image description, pick your vision model.

Cadenzium

Open Settings → AI. Set the Ollama host and pick a chat model — Cadenzium uses it to clean up OCR output from scanned journal pages.

Even a small model like llama3.2:3b works well for OCR correction, so you can get away with less RAM for Cadenzium than for Arion Q&A.

! Troubleshooting

"Connection refused" or "Could not connect to Ollama"

Ollama isn't running. On Windows, check the system tray (lower- right corner near the clock) for the Ollama icon. If it's not there, search for Ollama in the Start menu and launch it — you'll see an Ollama is running notification. On macOS, look for the Ollama icon in the menu bar. On Linux, run ollama serve in a terminal.

The model is very slow

First-time queries of a model load it into memory, which can take 10–30 seconds. Subsequent queries are much faster. If it stays slow, the model may be too large for your RAM (Ollama will fall back to disk, which is dramatically slower). Try a smaller model.

Out of disk space

Models are stored in %USERPROFILE%\.ollama\models on Windows, ~/.ollama/models on macOS/Linux. To free space, remove models you no longer use:

ollama rm llama3.1:8b

Need to use a GPU on Windows

Ollama detects NVIDIA and AMD GPUs automatically. If it's not using your GPU, make sure your graphics drivers are up to date — the Ollama installer logs at %LOCALAPPDATA%\Ollama\server.log will show whether a GPU was detected.