Make a scanned PDF text-searchable for court
A scanned document converted to PDF without OCR is an image-only file — visually it looks like a normal PDF, but the text layer is absent. E-filing systems that require searchable text will reject or warn on these files. PrepFile's pre-flight checker detects this by sampling the first few pages for extractable text characters. If it finds fewer than ~20 characters on the sampled pages, it flags the PDF as potentially non-searchable.
The fix is OCR (Optical Character Recognition), which reads the image pixels and adds a text layer to the PDF. PrepFile does not perform OCR — that processing is computationally intensive and requires quality review. Options: Adobe Acrobat Pro (Tools → Scan & OCR → Recognize Text), ABBYY FineReader, Microsoft Lens (mobile), or a dedicated OCR service.
After OCR, re-run PrepFile's checker: the searchable check should pass. Then verify the recognized text is accurate on a few pages — OCR errors in key terms can affect later document searches.
Open the tools — free, no upload
OCR workflow for a scanned document
- Confirm the PDF is image-only: run PrepFile's E-Filing Checker — if searchability fails, proceed.
- Open the PDF in Acrobat Pro → Tools → Scan & OCR → Recognize Text → In This File → Run.
- Review the recognized text on a spot-check of pages. Correct any errors if critical.
- Save the OCR'd PDF.
- Re-run PrepFile's checker on the new file to confirm searchable check now passes.
Questions
How does PrepFile know the PDF is image-only?
It uses pdfjs-dist to extract text content from the first five pages. If the total character count from those pages is under 20, the PDF is flagged. This heuristic catches pure image PDFs but may occasionally flag legitimate low-text documents like signature pages.
Can I OCR in the browser?
Browser-based OCR tools exist (Tesseract.js) but are significantly slower and less accurate than desktop OCR tools for court documents. For filings, use a dedicated OCR application.
Is the OCR layer private?
OCR processing in Acrobat happens locally on your machine. Cloud-based OCR services transmit your document to their servers — check their data retention policy if the document is sensitive.