Uncategorized

PDF manipulation with Python

This post covers basic PDF manipulation for daily tasks using simple Python modules.

  1. Merging mulitple PDF
  2. Extract text from PDF
  3. Extract image from PDF

Merging PDF

from PyPDF2 import PdfFileMerger
pdfs = ['a.pdf', b.pdf]
merger = PdfFileMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("output.pdf")

Extract text from PDF

import pdftotext

# Load your PDF
with open("Target.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# Save all text to a txt file.
with open('output.txt', 'w') as f:
    f.write("\n\n".join(pdf))

More information from “Convert PDF pages to text with python

Extract Image (JPEG) from PDF

 

import os
import tempfile
from pdf2image import convert_from_path

filename = 'target.pdf'

with tempfile.TemporaryDirectory() as path:
     images_from_path = convert_from_path(filename, output_folder=path, last_page=1, first_page =0)

base_filename  =  os.path.splitext(os.path.basename(filename))[0] + '.jpg'      

save_dir = 'your_saved_dir'

for page in images_from_path:
    page.save(os.path.join(save_dir, base_filename), 'JPEG')

More information from “Convert PDF pages to JPEG with python

Advertisement

How to Install Scrapy in Windows

scraper24x7

^543DDAB9D1F7B62090D7E854E3A49575E5E9C30402B1E8631F^pimgpsh_fullsize_distr

It took a lot of time for me to install scrapy in my windows pc. I have tried the Installation Guide by scrapy, and tried the tutorials from YouTube and always ended up with having errors. And i tried for weeks installing and uninstalling components, always got different errors. And finally, with lots of research, I successfully installed Scrapy. So, this is how i did it.

Step 1: Install Python 2.7

You can download Python 2.7 from here. Please make sure that you are downloading and installing Python 2.7, because scrapy don’t support the Python 3 versions. But, scrapy is working on making it compatible with Python 3. If you have already installed Python 3, uninstall it before installing Python 2.7.

Python

Now you need to add C:Python27 and C:Python27Scripts to your Path environment variable. To do this open your command prompt and type the following and hit enter:

c:python27python.exe c:python27toolsscriptswin_add2path.py

To check whether Python have installed properly, go to…

View original post 316 more words