A simple guide to extract images (jpeg, png) from PDF.
- Objectives:
-
- Extract Images from PDF
-
- Required Tools:
-
- Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
- Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
- Pdf2image— Python module. Wraps the pdftoppm utility to convert PDF to a PIL Image object.
-
- Steps:
-
- Install Poppler. For windows, Add “xxx/bin/” to env path
- pip install pdf2image
-
Usage
import os import tempfile from pdf2image import convert_from_path filename = 'target.pdf' with tempfile.TemporaryDirectory() as path: images_from_path = convert_from_path(filename, output_folder=path, last_page=1, first_page =0) base_filename = os.path.splitext(os.path.basename(filename))[0] + '.jpg' save_dir = 'your_saved_dir' for page in images_from_path: page.save(os.path.join(save_dir, base_filename), 'JPEG')
Further notes
One comment