Convert PDF pages to JPEG with python

A simple guide to extract images (jpeg, png) from PDF.

  1. Objectives:
      1. Extract Images from PDF
  2. Required Tools:
      1. Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
      2. Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
      3. Pdf2image— Python module. Wraps the pdftoppm utility to convert PDF to a PIL Image object.
  3. Steps:
      1. Install Poppler. For windows, Add “xxx/bin/” to env path
      2. pip install pdf2image

Usage

import os
import tempfile
from pdf2image import convert_from_path

filename = 'target.pdf'

with tempfile.TemporaryDirectory() as path:
     images_from_path = convert_from_path(filename, output_folder=path, last_page=1, first_page =0)

base_filename  =  os.path.splitext(os.path.basename(filename))[0] + '.jpg'      

save_dir = 'your_saved_dir'

for page in images_from_path:
    page.save(os.path.join(save_dir, base_filename), 'JPEG')

Further notes 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s