pdftoimage

Running R on Jupyter Notebook with R Kernel (No Anaconda)

A simple guide to install R Kernel on Jupyter Notebook (Windows). Do not need Anaconda.

Objectives:
1. 1. Install R Kernel on Jupyter Notebook (Windows)
Required Tools:
1. 1. R for windows— R for windows
  2. JupyterNotebook — Jupyter Notebook
Steps:
1. 1. Install R. Use the R terminal (do not use R studio) to install R packages:
    - install.packages(c(‘repr’, ‘IRdisplay’, ‘evaluate’, ‘crayon’, ‘pbdZMQ’, ‘devtools’, ‘uuid’, ‘digest’))
    - install.packages(‘IRkernel’)
  2. Make Kernel available to Jupyter
    - IRkernel::installspec()
    - OR IRkernel::installspec(user = FALSE) #install system-wide
  3. Open a notebook and open new R script.

Further notes

After getting Additional R library might be hard to install inside the Notebook. For workaround, install desired library in R terminal then open the Notebook.
If need to use R.exe on windows command terminal, ensure R.exe is on path. [likely location: C:\R\R-2.15.1\bin]
ggplot tutorial

References:

Convert PDF pages to text with python

A simple guide to text from PDF. This is an extension of the Convert PDF pages to JPEG with python post

Objectives:
1. 1. Extract text from PDF
Required Tools:
1. 1. Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
  2. Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
  3. pdftotext— Python module. Wraps the poppler pdftotext utility to convert PDF to text.
Steps:
1. 1. Install Poppler. For windows, Add “xxx/bin/” to env path
  2. pip install pdftotext

Usage (sample code from pdftotext github)

import pdftotext

# Load your PDF
with open("Target.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# Save all text to a txt file.
with open('output.txt', 'w') as f:
    f.write("\n\n".join(pdf))

Further notes

https://github.com/jalan/pdftotext

See also:

Convert PDF pages to JPEG with python

Convert PDF pages to JPEG with python

A simple guide to extract images (jpeg, png) from PDF.

Objectives:
1. 1. Extract Images from PDF
Required Tools:
1. 1. Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
  2. Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
  3. Pdf2image— Python module. Wraps the pdftoppm utility to convert PDF to a PIL Image object.
Steps:
1. 1. Install Poppler. For windows, Add “xxx/bin/” to env path
  2. pip install pdf2image

Usage

import os
import tempfile
from pdf2image import convert_from_path

filename = 'target.pdf'

with tempfile.TemporaryDirectory() as path:
     images_from_path = convert_from_path(filename, output_folder=path, last_page=1, first_page =0)

base_filename  =  os.path.splitext(os.path.basename(filename))[0] + '.jpg'      

save_dir = 'your_saved_dir'

for page in images_from_path:
    page.save(os.path.join(save_dir, base_filename), 'JPEG')

Further notes