Set up local AI

Karja's AI features run on a model on your machine — never the cloud. You install a small local server (Ollama or oMLX), download a model, and paste its address into Karja. About five minutes, once.

What is local AI, and why it matters

"Local AI" means the language model runs as a program on your own computer instead of on a company's servers. You download the model file once; from then on every answer is generated by your hardware.

Private

Your notes, documents and prompts are processed on your own machine. Nothing is sent to a cloud provider.

Free to run

No per-token fees, no subscription, no API bill. Once a model is on your disk, every query is free.

Works offline

No internet required once the model is downloaded — useful on a plane, in the field, or behind a firewall.

1. Pick a backend

Karja talks to a local AI server. Pick whichever suits your machine — you only need one.

Ollama

Runs on macOS, Windows and Linux. The simplest option and the best choice on a PC or a non-Apple-Silicon machine.

oMLX

Apple Silicon Macs only (M1–M4, macOS 15+). Built on Apple's MLX for top speed on a Mac, managed from the menu bar.

2. Install Ollama — macOS · Windows · Linux

1

Download and install

Get the installer for your OS from ollama.com/download. On macOS and Windows it runs a background server automatically after install. On Linux:

curl -fsSL https://ollama.com/install.sh | sh

2

Download a model

In a terminal, pull a model. A small 3B model is a good starting point and runs on modest hardware:

ollama pull llama3.2

Browse the full catalogue at ollama.com/library. Bigger models are smarter but need more RAM — pick a smaller one (e.g. llama3.2:3b or qwen2.5:3b) if you have 8 GB.

3

Note your API URL

Ollama serves at http://127.0.0.1:11434. Confirm it's running by opening that address in a browser — you should see "Ollama is running". That's the URL you'll paste into Karja.

…or install oMLX — Apple Silicon Macs

1

Check requirements

oMLX needs an Apple Silicon Mac (M1–M4) running macOS 15 or newer. Learn more at omlx.ai.

2

Install the app

Download the .dmg from the oMLX releases page and drag it to Applications. Or, with Homebrew:

brew tap jundot/omlx https://github.com/jundot/omlx
brew install omlx

3

Start the server and download a model

Launch oMLX from Applications — it lives in your menu bar. The Welcome screen walks you through three steps: choose a model folder, start the server, and download your first model. You can browse and download more models any time from the built-in admin dashboard.

4

Find your API URL

oMLX serves at http://127.0.0.1:8000 by default (you pick the host and port). To confirm the exact address, open the dashboard at 127.0.0.1:8000/admin/dashboard and look under API endpoints. That's the URL for Karja.

3. Connect it to Karja

1

Open AI settings

In Karja, click Settings (the gear at the bottom of the left sidebar), then open the AI tab.

2

Choose your backend and paste the URL

Select Ollama or oMLX, then put your address in Base URL — http://127.0.0.1:11434 for Ollama, http://127.0.0.1:8000 for oMLX. (If you set an API key on oMLX, paste it in the API key field.)

3

Pick a model

Click Refresh under Model and choose one of your downloaded models from the dropdown. If the list is empty, the server isn't running or you haven't downloaded a model yet.

4. Try it — ask AI about a note

Karja's AI Chat can read your records as context. Here's the full loop in Notepad:

1

Write a note

Open Notepad from the sidebar and type a few paragraphs — say, some meeting notes or a draft.

2

Open AI Chat

Click AI Chat at the bottom of the left sidebar. A chat pane opens at the bottom of the window.

3

Attach the note as context

Type @ in the chat box and pick your note from the list. It's now attached, so the model can read it.

4

Ask a question

Type something like "Summarise this note in three bullet points" and press Enter. The answer streams in — generated entirely on your machine. 🎉

Good to know

Keep Ollama or oMLX running while you use Karja's AI features.
Larger models give better answers but need more memory — if responses are slow or the app struggles, switch to a smaller model.
Your prompts and documents never leave your computer.