Pick the script in the image. The first run downloads a small language pack (~3–10 MB) and caches it offline.
Stays on your device. The OCR engine runs in your browser via WebAssembly (or WebGPU for Florence-2). No upload, no signup, unlimited use.
Free image-to-text OCR that runs entirely in your browser
PocketWebTools' OCR extracts text from any image without uploading it. Drop a screenshot, a scan, or a photo of a sign and the tool returns the words inside, ready to copy or download as a text file. Two engines are available depending on the language you need and the quality of the input.
Tesseract is the multilingual workhorse. Pure WebAssembly, no GPU required, covers more than 100 scripts including Arabic, Chinese, Japanese, Korean, Hindi, Russian, Hebrew, and Thai. Pick a language from the picker, drop your image, and you have your text in a few seconds. Florence-2 is Microsoft's open-source vision-language model running through WebGPU. It is sharper on photographed English text, slide screenshots, and clean handwriting; you do not need to pick a language because it reads the image as a scene.
How to use it
- Drop, paste, or click to upload an image (PNG, JPEG, WebP, AVIF, BMP, or TIFF).
- Pick an engine: Tesseract for any non-English language, photos with mixed scripts, or iPhone; Florence-2 for sharper English on photographed text.
- If you picked Tesseract, choose the language in the image. The first run downloads a small language pack and caches it offline.
- Click Extract text. The first run for each engine downloads its model files; subsequent runs are instant.
- Copy the result to your clipboard or download it as a .txt file.
What it works well on
- Screenshots of articles, social posts, tweets, or chat windows.
- Scanned receipts, invoices, contracts, and forms.
- Photos of street signs, menus, posters, and product labels.
- Slide decks and presentations captured as images.
- Multilingual content. Tesseract handles Arabic, Chinese, Japanese, Korean, Hindi, Cyrillic, Hebrew, Thai, and Latin scripts including European and Latin American languages.
- Code screenshots you want to paste back into an editor without retyping.
When not to use it
Cursive handwriting, dense multi-column scientific PDFs, and heavily skewed or low-light photos can produce noisy output. If you find the result is unusable, try the other engine first; Florence-2 is usually sharper on photos and Tesseract on flat scans. Tables and complex layouts come back as plain text without structure today. Column boundaries are not preserved.
For very large or multi-page PDFs we recommend exporting individual pages to images and running them through one at a time. A native paged-document flow is on the roadmap.
Private by design, free forever
Cloud OCR services like Google Document AI, AWS Textract, Azure Vision, and the subscription-only Mistral OCR API charge per page and require uploading your document to a server you do not control. We take the opposite path: the engine runs in your tab, your image never touches our servers, and we do not have an inference bill to pay so the tool is free and unlimited. The trade-off is that you bring your own compute; the upside is privacy, speed on subsequent runs, and zero account creation.
Frequently asked questions
- Does my image get uploaded anywhere?
- No. Your image stays in your browser the entire time. Both OCR engines run locally on your device using WebAssembly (Tesseract) or WebGPU (Florence-2). There is no server in the loop, no upload, and nothing for us to log.
- Which engine should I pick?
- Use Tesseract for anything that is not English: Arabic, Chinese, Japanese, Korean, Hindi, Russian, Hebrew, Thai, or any of 100+ supported scripts. Tesseract is also the right pick on iPhone because it runs on every device with no GPU requirement. Use Florence-2 for sharp English text, especially photographed signs, slide screenshots, and tricky handwriting. Florence-2 needs WebGPU and downloads about 120 MB of model weights the first time.
- What languages does Tesseract support?
- Over 100 scripts, with curated picks for English, Arabic, Spanish, French, German, Portuguese, Italian, Dutch, Russian, Ukrainian, Polish, Turkish, Simplified and Traditional Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Hebrew, and Persian. Each language is a separate small download (about 3 to 10 MB) cached locally after the first run.
- How accurate is the text recognition?
- On clean printed text (PDFs, screenshots, scans of typed documents) accuracy is typically above 95% for both engines. On photographed text or low-resolution images, Florence-2 is usually sharper because it understands the image as a vision-language scene rather than a pixel grid. Tesseract reports a per-image confidence score next to the result; below about 70% you may want to retry with a higher-resolution capture.
- Does it work on handwriting?
- Florence-2 can sometimes read clear, well-formed handwriting, especially block letters; cursive or stylised handwriting is hit-or-miss. Tesseract is purpose-built for printed text and struggles with handwriting. For consistently good handwriting OCR you generally need a model trained specifically on the IAM Handwriting dataset, which we do not ship today; if there is demand we will add a TrOCR-handwritten option to the picker.
- Why does it ask to download files the first time?
- Both engines bring their own assets. Tesseract downloads a small WebAssembly core (about 5 MB) plus one traineddata pack per language (3 to 10 MB). Florence-2 downloads roughly 120 MB of quantised model weights. All of that caches in your browser so subsequent runs are instant. You can review or delete Florence-2 weights from the Local Models chip in the page header; Tesseract uses its own internal IndexedDB cache.
- Does this work offline?
- Yes, after the first visit. Once the engine and language pack are cached your browser does not need a network connection to run them again, so you can use the tool on a plane or in a tunnel.
- Can I OCR a PDF or multi-page document?
- Not directly today. Both engines process a single image per call. For a multi-page PDF the workflow is to export the pages to images (PNG or JPEG) and run them through one at a time, or screenshot the pages you need. We are planning a paged PDF flow as a follow-up.
- Can I extract text from a specific region of the image?
- Not yet in the UI. Florence-2 supports a region-aware OCR mode that returns bounding boxes for every word, and we plan to surface a select-region affordance once we ship a second iteration. For now the tool returns the full transcription of the entire image.
- How does this compare to cloud OCR APIs like Google Document AI or Mistral OCR?
- Cloud APIs are typically more accurate on edge cases (faded scans, complex tables, dense forms) because they run much larger models. They are also paid per page and require uploading your document to a server you do not control. We run smaller models locally, trading some peak accuracy for being free, unlimited, and private. For everyday text extraction the gap is small and the workflow advantages add up.
- Can I use the result commercially?
- Yes. The text is yours. Tesseract is Apache 2.0 licensed and Florence-2 is MIT licensed. Both are permissive open-source licenses that allow commercial use. We do not claim any rights over text you extract here.
- What is WebGPU and does my browser support it?
- WebGPU is a modern browser API that runs the AI computation on your GPU. Florence-2 needs it to load at a reasonable speed and to run inference in roughly a second per image. Tesseract does not use WebGPU; it runs on the CPU through WebAssembly and works everywhere. WebGPU is available in Chrome, Edge, recent Safari, and recent Firefox on most desktop and Android devices.
Related tools
- Background remover: cut the subject out of any image, all in your browser.
- Image upscaler: sharpen and enlarge small images up to 4× with AI.
- Audio transcription: drop an audio or video file, get a timestamped transcript with subtitles.
- Word counter: count words, characters, and GPT tokens in the extracted text.