Thunderbit Docs
  • 👋Welcome!
    • Quick Start Guide - Everything you need to know
    • How to Set Up Thunderbit
    • Playground Guide
  • 💫Basic
    • Current Page Scraping
    • URLs List Scraping
    • File/Image Scraping
    • Pagination & Scrolling
    • Browser vs. Background
    • Export Guide
      • Export to CSV/Excel
      • Export to Notion
      • Export to Airtable
      • Export to Google Sheets
  • advanced
    • Data Type
    • Custom Instruction
    • Subpage Scraping
  • DEmo
    • Sales
    • Real Estate
    • Operations
    • Marketing
  • Troubleshoot
    • Pricing Plans
    • Billing and Subscription
    • Extension Settings
    • Community Forums
    • Enterprise Support
Powered by GitBook
On this page
  • Feature Overview
  • Prerequisites
  • Workflow
  • Step 1: File Upload
  • Step 2: Content Selection
  • Step 3: Data Extraction
  • Case Study: PDF Report Extraction
  • Applications
  1. Basic

File/Image Scraping

Feature Overview

Thunderbit's File/Image Scraping extracts information from PDFs and images through OCR technology, ideal for document digitization.

Prerequisites

  1. Thunderbit extension installed

  2. PDF/image files prepared

  3. Sidebar activated via toolbar icon

Workflow

Step 1: File Upload

  1. Click "File/Image" option

  2. Select "Upload Files"

Step 2: Content Selection

  • Use "AI Suggest" for auto-detection

  • Or apply "Custom Template"

Step 3: Data Extraction

  1. Click "Scrape" to begin

  2. Track progress via notifications

Case Study: PDF Report Extraction

Target File: Annual Financial Report PDF

Configuration:

Output:

Applications

  • Table data extraction from PDF reports

  • Text recognition in images

  • Key information retrieval from scanned documents

PreviousURLs List ScrapingNextPagination & Scrolling

📌 Tip: Explore more file scraping methods at our .

💫
Help Center