# Image OCR Extract text from images and scans with Tesseract OCR. ## What it does - Reads text from photos, screenshots, and scanned documents. - Optional preprocessing (grayscale, threshold, deskew) for noisy images. - Multi-language, including English and Chinese (eng+chi_sim). - Reports low-confidence regions instead of guessing. ## Files | File | Purpose | |------|---------| | `SKILL.md` | Instructions the agent receives on activation | | `scripts/ocr.py` | Preprocess + OCR an image with language/confidence options | | `references/accuracy.md` | Preprocessing and language-pack tips | ## Requirements Installs `pytesseract` + `Pillow`; needs the `tesseract-ocr` binary and the relevant language packs (e.g. `tesseract-ocr-chi-sim`) available in the sandbox. ## License Apache-2.0. --- name: image-ocr display_name: Image OCR description: "Extract text from images and scanned documents using OCR. Use when the user sends a photo, screenshot, or scan and wants the text read out, transcribed, or extracted — including receipts, business cards, whiteboards, slides, or any picture containing text. Supports multiple languages including English and Chinese. Do NOT use for editable PDFs with a real text layer (extract directly instead)." license: Apache-2.0 --- # Image OCR Read text out of an image or scanned page using Tesseract OCR in the sandbox. ## When to use The user provides an image (photo, screenshot, scan) and wants the text it contains — transcription, extraction, or "what does this say?". ## Execution steps 1. **Locate the image** under `/workspace`. Confirm the language(s) of the text; default to English + Simplified Chinese (`eng+chi_sim`) when unsure. 2. **Preprocess** for better accuracy with `python scripts/ocr.py <image> --preprocess`. It grayscales, thresholds, and deskews using Pillow before OCR (see `references/accuracy.md` for when each step helps). 3. **Run OCR**: `python scripts/ocr.py <image> --lang eng+chi_sim`. The script installs `pytesseract` + `Pillow` and ensures the Tesseract binary and the requested language packs are present. 4. **Return** the extracted text. For structured docs (receipts, tables), preserve line breaks and note low-confidence regions. ## Rules - Pick the right `--lang`; the wrong language pack wrecks accuracy. - Preprocess noisy/low-contrast images before OCR; skip it for clean screenshots. - Never fabricate text you cannot actually read — mark unclear spans as `[?]`. ## Available resources - `scripts/ocr.py` — preprocess + OCR an image, with language and confidence options. - `references/accuracy.md` — preprocessing and language-pack tips for better results.
Image OCR by langbot-team
Extract text from images and scans with OCR. Supports English and Chinese.
Loading...