Idiomatic Elixir interface to the Tesseract OCR engine. Implemented as a NIF over the Tesseract 5.x C++ API; accepts Vix.Vips.Image structs, file paths, or in-memory image binaries and returns recognised text.
-
Tesseract ≥ 5.0 and Leptonica installed at build time, with both reachable via
pkg-config.brew install tesseract leptonica pkg-config
Xcode Command Line Tools (for
clang++) must be installed:xcode-select --install.sudo apt-get install -y \ build-essential pkg-config \ libtesseract-dev libleptonica-dev tesseract-ocr
Ubuntu 24.04+ is required for Tesseract ≥ 5.0; 22.04 ships 4.x and will not build. On 22.04 either upgrade or install Tesseract 5 from a PPA / source.
sudo dnf install -y \ gcc-c++ pkgconf-pkg-config \ tesseract-devel leptonica-devel
sudo pacman -S base-devel pkgconf tesseract leptonica
apk add build-base pkgconf tesseract-ocr-dev leptonica-dev
Native Windows builds are not supported out of the box. Use WSL2 with Ubuntu 24.04 and follow the Debian/Ubuntu instructions above — this is the path of least resistance and is what we test against.
Building natively requires MSYS2 / MinGW-w64 with
g++,pkg-config,mingw-w64-x86_64-tesseract-ocr, andmingw-w64-x86_64-leptonicaavailable onPATH. Untested upstream — patches welcome. -
Elixir ≥ 1.17 and OTP ≥ 26.
-
A working C++17 compiler (
g++orclang++) andpkg-configon the build host. The NIF is built withelixir_makeon first compile.
def deps do
[
{:image_ocr, "~> 0.1.0"}
]
endBuild the NIF on first compile:
mix deps.get
mix compileimage_ocr ships the English (eng) tessdata_fast model in priv/tessdata/ so the package is usable out of the box.
{:ok, ocr} = Image.OCR.new() # defaults to locale: "en"
{:ok, text} = Image.OCR.read_text(ocr, "page.png")read_text/3 accepts:
-
A
Vix.Vips.Image.t()— used directly. -
A path to an image file — loaded via
Vix.Vips.Image.new_from_file/1. -
An in-memory binary of encoded image data (PNG, JPEG, TIFF, …) — loaded via
Vix.Vips.Image.new_from_buffer/1.
For per-word output with confidence and bounding boxes, use Image.OCR.recognize/3:
{:ok, words} = Image.OCR.recognize(ocr, image)
# => [%{text: "Hello", confidence: 96.4, bbox: {32, 18, 198, 64}}, …]The :locale option (and the mix-task language arguments) accept:
-
ISO 639-1 two-letter codes —
"en",:en,"fr",:de,"ja". -
BCP-47 tags for region- or script-specific variants —
"zh-Hans"(Simplified Chinese),"zh-Hant"(Traditional),"sr-Latn"(Serbian in Latin script),"az-Cyrl". The built-in table covers the common cases. -
Any BCP-47 locale —
"en-US","fr-CA","zh-Hans-CN","sr-Latn-RS"— when the optional:localizedependency is installed. With Localize, the locale is parsed and the language + script subtags are used to pick the right Tesseract trained data; territory subtags are ignored (Tesseract doesn't differentiate by territory). -
Tesseract codes verbatim —
"frk"(German Fraktur),"osd"(orientation/script detection),"script/Latin". -
+-joined combinations —"en+fr","chi_sim+eng","ja+en".
"zh" on its own is rejected as ambiguous — use "zh-Hans" or "zh-Hant". See Image.OCR.Languages for the full mapping table.
To enable BCP-47 parsing add Localize to your project:
def deps do
[
{:image_ocr, "~> 0.1.0"},
{:localize, "~> 0.25"}
]
endA single Image.OCR instance wraps one tesseract::TessBaseAPI, which is not safe for concurrent use. The NIF guards each instance with a mutex so accidental sharing degrades to serialisation rather than UB, but for real parallelism you want one instance per worker. The simplest way is the included pool:
children = [
{Image.OCR.Pool, name: MyOcr, locale: "en", pool_size: 4}
]
Supervisor.start_link(children, strategy: :one_for_one)
{:ok, text} = Image.OCR.Pool.read_text(MyOcr, "page.png")pool_size defaults to System.schedulers_online(). Each worker holds the loaded language model in memory — typically 2–50 MB depending on the language and trained-data variant — so size deliberately if you also load multiple languages or run on small hosts.
Recognition runs on dirty CPU schedulers, so it does not block the normal schedulers regardless of pool size.
The trained-data directory is resolved in this order:
- The
:datapathoption passed toImage.OCR.new/1. Application.get_env(:image_ocr, :tessdata_path).- The
TESSDATA_PREFIXenvironment variable. - The vendored fallback at
priv/tessdata/.
Configure a project-wide location once:
# config/config.exs
config :image_ocr, tessdata_path: "/var/lib/image_ocr/tessdata"Manage trained-data files without leaving your project:
# Install one or more languages (ISO 639-1 codes)
mix image.ocr.tessdata.add fr de
# BCP-47 for region/script-specific variants
mix image.ocr.tessdata.add zh-Hans zh-Hant sr-Latn
# Pick a variant: fast (default, ~2-4 MB), best (~10-15 MB), legacy (largest)
mix image.ocr.tessdata.add en --variant best
# Write to a specific directory (overrides config and TESSDATA_PREFIX)
mix image.ocr.tessdata.add ja --path /var/lib/tessdata
# Refresh every installed language to its latest upstream commit
mix image.ocr.tessdata.update
# Show what's installed
mix image.ocr.tessdata.list
# Remove a language
mix image.ocr.tessdata.remove deThe tasks read from and write to the same path that Image.OCR.new/1 does, so there is one source of truth.
image_ocr requires Tesseract 5.x (currently 5.5+) and refuses to build against older versions. 5.x is actively maintained, ships in current LTS distros, and runs noticeably faster than 4.x on modern CPUs thanks to better SIMD use and float32 models. The C++ API surface we use is identical between 4.x and 5.x, so 4.1+ would likely work — but we keep the support matrix tight.
An interactive demonstration is at livebooks/demo.livemd. It covers one-shot OCR, reusable instances, per-word bounding boxes, the NimblePool, PSM/SetVariable tweaks, and uploading your own image.
Apache-2.0.