Free, Trial, or Pro: Choosing the Right DijiFlow Plan (and What Each Includes)
Compare DijiFlow Dictate's Free, Trial, and Pro plans. See exactly what each tier includes so you can pick the right ...
You have %itemCount% in your cart.Total being %total%
How DijiFlow Dictate turns your voice into text entirely on your device, using Whisper, CoreML and Apple Silicon. Explained in plain language.
Most dictation feels like magic until you ask the obvious question: where does my voice actually go? With DijiFlow Dictate, the honest answer is nowhere. You speak, text appears at your cursor, and not one word travels to a server. No account, no upload, no telemetry. That is not a privacy promise bolted on at the end — it falls out of how the app is built.
Three well-understood pieces make it work: Whisper, the open speech model that does the listening; CoreML, the framework that runs it efficiently on a Mac; and Apple Silicon, the chip that makes it feel instant. No prior knowledge needed — here is each one in plain terms.
At the heart of DijiFlow Dictate is Whisper, a family of open-source speech recognition models from OpenAI. A speech model is, in plain terms, a very large pattern-matcher trained on enormous amounts of audio paired with its transcript. From that data it learns how the sounds people make line up with the words they mean — across accents, background noise, and the natural pauses of real speech.
When you dictate, Whisper predicts the most likely sequence of words from your microphone audio, and it is genuinely good at it. On clear speech it reaches around 98% accuracy, and the most capable version, Whisper large-v3, handles up to 90+ languages. Because it reads context rather than matching one word at a time, it copes with the messy way people actually talk.
This is the part that surprises people: the app and the intelligence are two different files. DijiFlow Dictate itself is tiny — about 12 MB. The Whisper speech models are the heavy part, ranging from roughly 300 MB to 6 GB depending on which you pick. Larger models are generally more accurate on difficult audio but ask more of your hardware, so you choose the balance of speed and accuracy that suits you.
You download a model once; after that, transcription needs no internet at all. That one-time step is exactly why your voice can stay on your machine.
A speech model is only useful if it runs quickly without draining your battery. That is the job of CoreML, Apple's framework for running machine-learning models on its devices. Think of it as a translator and traffic controller: it takes a model like Whisper and works out how to run it using the most suitable parts of your hardware.
DijiFlow Dictate uses WhisperKit, an open-source runtime that compiles Whisper to run through CoreML. That means the model is optimized specifically for Apple hardware instead of running as generic, slower code, so dictation keeps pace with natural speech while staying light on system resources. And it all happens locally — CoreML is not a cloud service. It is part of the operating system that lets apps run intelligent features privately and offline.
The last piece is the hardware. On modern Macs that means Apple Silicon — the M-series chips in machines running macOS 14 or later. These chips include a dedicated Neural Engine, a section of silicon built specifically to run machine-learning models fast and with very little power, with the GPU available through Metal when extra horsepower helps.
You configure none of this. CoreML spreads the work across the right hardware automatically; you just speak, and the chip handles it in real time. That is the quiet advantage of on-device design: the same silicon that makes your Mac feel responsive is what makes private dictation practical.
Put the three pieces in order and the round trip is short — and entirely local.
Audio from your microphone is captured on the device, never streamed anywhere.
The model turns sound into words right there on Apple Silicon, in real time.
Your words appear in whatever app you are already in. Nothing is sent away, so there is nothing to leak.
Key takeaway
The model lives on your machine, so transcription is just local computation — there is no server in the loop to store, intercept, or quietly retain your voice.
Most voice tools are cloud services wearing an app icon: they need a connection and an account every time, because the model that understands you lives on someone else's hardware. DijiFlow Dictate flips that — you install once, and the work moves to your chip.
| How it behaves | DijiFlow (on-device) | Cloud dictation |
|---|---|---|
| Works after a one-time download | ✓ | ✗ |
| Transcribes with no internet | ✓ | ✗ |
| No account required | ✓ | ✗ |
| Audio stays on your device | ✓ | ✗ |
The same on-device approach extends to Windows 10 and 11, where DijiFlow Dictate runs on AMD, Intel, and NVIDIA GPUs. NVIDIA hardware needs CUDA and a current driver, but the principle is identical: your speech is transcribed locally, and nothing is sent away.
There is nothing exotic happening here. DijiFlow Dictate is built on open, well-understood technology — Whisper for the speech model, WhisperKit and CoreML for the runtime, and Apple Silicon for the hardware. The decision that matters is keeping all of it on your device, so you get the convenience of modern dictation without ever handing your voice to anyone, across Free, Trial, and Pro.
If you would rather feel it than read about it, you can try private, on-device dictation free for 30 days on the Pro plan.
Private, 100% on-device voice-to-text in 90+ languages — free forever, Pro when you need more.