Month: June 2019

Google Colab import data, Specs, link Gsheets & link with Kaggle

Importing data to colab

  1. Direct import
    • from google.colab import files
      uploaded = files.upload()
    • import io
      df = pd.read_csv(io.BytesIO(uploaded[‘target.csv’]))
  2. Setup to use file from google drives
    • from google.colab import drive
      drive.mount(‘/content/drive’)
    • View list of files:
    • !ls “/content/drive/My Drive”
    • Note: In the notebook, click on the charcoal > on the top left of the notebook and click on Files, select the file and right click to “copy path”. Note the path must begin with “/content/xxx”

Hardware Spec for Colab

See link.

Linking with Google Sheets (reference from source)

# Step 1
!pip install --upgrade --quiet gspread

# Step 2
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

# Step 3
sh = gc.create('My spreadsheet')

worksheet = gc.open('My spreadsheet').sheet1

cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

Note: The google sheets is at the starting page of google Drive. Still figuring out the way to specify target directory.

Linking with Kaggle (eg. direct download and import Kaggle dataset)

  1. Retrieve API token from Kaggle (Kaggle–> accounts –> under AP,  hit “Create New API Token.”
  2. Save the token.json in Google Drive
  3. Run the following on colab to link with Kaggle
!pip install kaggle
!mkdir .kaggle
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

source: https://colab.research.google.com/drive/1JG6d49pAWpn4kF92c0Ko16gQV6hptAro#scrollTo=CSKDTkuLuTY3

!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v{/content}

Testing

!kaggle datasets list

Downloading particular data set from Kaggle

  1. Under particular Kaggle competition, look under Data and get the API commands
  2. Eg. Some competition data set from Kaggle
  3. Commands copied from API: kaggle competitions download -c ndsc-advanced
  4. Modify the Command and run in Colab:
    • !kaggle competitions download -c ndsc-advanced -p /content
  5. Unzip the files:
    • !unzip \*.zip
  6. Open file with pandas:
    • import pandas as pd
      d = pd.read_csv(‘beauty_data_info_val_competition.csv’)

References:

  1. Setting Up Kaggle in Google Colab
Advertisement

Running R on Jupyter Notebook with R Kernel (No Anaconda)

A simple guide to install R Kernel on Jupyter Notebook (Windows).  Do not need Anaconda.

  1. Objectives:
      1. Install R Kernel on Jupyter Notebook (Windows)
  2. Required Tools:
      1. R for windows— R for windows
      2. JupyterNotebook — Jupyter Notebook
  3. Steps:
      1. Install R. Use the R terminal (do not use R studio) to install R packages:
        • install.packages(c(‘repr’, ‘IRdisplay’, ‘evaluate’, ‘crayon’, ‘pbdZMQ’, ‘devtools’, ‘uuid’, ‘digest’))
        • install.packages(‘IRkernel’)
      2. Make Kernel available to Jupyter
        • IRkernel::installspec()
        • OR IRkernel::installspec(user = FALSE) #install system-wide
      3. Open a notebook and open new R script.

Further notes 

  • After getting Additional R library might be hard to install inside the Notebook. For workaround, install desired library in R terminal then open the Notebook.
  • If need to use R.exe on windows command terminal, ensure R.exe is on path. [likely location: C:\R\R-2.15.1\bin]
  • ggplot tutorial

References: