Question 1

Does my image get uploaded anywhere?

Accepted Answer

No. Your image stays in your browser the entire time. Both OCR engines run locally on your device using WebAssembly (Tesseract) or WebGPU (Florence-2). There is no server in the loop, no upload, and nothing for us to log.

Question 2

Which engine should I pick?

Accepted Answer

Use Tesseract for anything that is not English: Arabic, Chinese, Japanese, Korean, Hindi, Russian, Hebrew, Thai, or any of 100+ supported scripts. Tesseract is also the right pick on iPhone because it runs on every device with no GPU requirement. Use Florence-2 for sharp English text, especially photographed signs, slide screenshots, and tricky handwriting. Florence-2 needs WebGPU and downloads about 120 MB of model weights the first time.

Question 3

What languages does Tesseract support?

Accepted Answer

Over 100 scripts, with curated picks for English, Arabic, Spanish, French, German, Portuguese, Italian, Dutch, Russian, Ukrainian, Polish, Turkish, Simplified and Traditional Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Hebrew, and Persian. Each language is a separate small download (about 3 to 10 MB) cached locally after the first run.

Question 4

How accurate is the text recognition?

Accepted Answer

On clean printed text (PDFs, screenshots, scans of typed documents) accuracy is typically above 95% for both engines. On photographed text or low-resolution images, Florence-2 is usually sharper because it understands the image as a vision-language scene rather than a pixel grid. Tesseract reports a per-image confidence score next to the result; below about 70% you may want to retry with a higher-resolution capture.

Question 5

Does it work on handwriting?

Accepted Answer

Florence-2 can sometimes read clear, well-formed handwriting, especially block letters; cursive or stylised handwriting is hit-or-miss. Tesseract is purpose-built for printed text and struggles with handwriting. For consistently good handwriting OCR you generally need a model trained specifically on the IAM Handwriting dataset, which we do not ship today; if there is demand we will add a TrOCR-handwritten option to the picker.

Question 6

Why does it ask to download files the first time?

Accepted Answer

Both engines bring their own assets. Tesseract downloads a small WebAssembly core (about 5 MB) plus one traineddata pack per language (3 to 10 MB). Florence-2 downloads roughly 120 MB of quantised model weights. All of that caches in your browser so subsequent runs are instant. You can review or delete Florence-2 weights from the Local Models chip in the page header; Tesseract uses its own internal IndexedDB cache.

Question 7

Does this work offline?

Accepted Answer

Yes, after the first visit. Once the engine and language pack are cached your browser does not need a network connection to run them again, so you can use the tool on a plane or in a tunnel.

Question 8

Can I OCR a PDF or multi-page document?

Accepted Answer

Not directly today. Both engines process a single image per call. For a multi-page PDF the workflow is to export the pages to images (PNG or JPEG) and run them through one at a time, or screenshot the pages you need. We are planning a paged PDF flow as a follow-up.

Question 9

Can I extract text from a specific region of the image?

Accepted Answer

Not yet in the UI. Florence-2 supports a region-aware OCR mode that returns bounding boxes for every word, and we plan to surface a select-region affordance once we ship a second iteration. For now the tool returns the full transcription of the entire image.

Question 10

How does this compare to cloud OCR APIs like Google Document AI or Mistral OCR?

Accepted Answer

Cloud APIs are typically more accurate on edge cases (faded scans, complex tables, dense forms) because they run much larger models. They are also paid per page and require uploading your document to a server you do not control. We run smaller models locally, trading some peak accuracy for being free, unlimited, and private. For everyday text extraction the gap is small and the workflow advantages add up.

Question 11

Can I use the result commercially?

Accepted Answer

Yes. The text is yours. Tesseract is Apache 2.0 licensed and Florence-2 is MIT licensed. Both are permissive open-source licenses that allow commercial use. We do not claim any rights over text you extract here.

Question 12

What is WebGPU and does my browser support it?

Accepted Answer

WebGPU is a modern browser API that runs the AI computation on your GPU. Florence-2 needs it to load at a reasonable speed and to run inference in roughly a second per image. Tesseract does not use WebGPU; it runs on the CPU through WebAssembly and works everywhere. WebGPU is available in Chrome, Edge, recent Safari, and recent Firefox on most desktop and Android devices.

Image to Text (OCR)

Free image-to-text OCR that runs entirely in your browser

How to use it

What it works well on

When not to use it

Private by design, free forever

Frequently asked questions

Related tools