Importing data to colab
- Direct import
- from google.colab import files
uploaded = files.upload() - import io
df = pd.read_csv(io.BytesIO(uploaded[‘target.csv’]))
- from google.colab import files
- Setup to use file from google drives
- from google.colab import drive
drive.mount(‘/content/drive’) - View list of files:
- !ls “/content/drive/My Drive”
- Note: In the notebook, click on the charcoal > on the top left of the notebook and click on Files, select the file and right click to “copy path”. Note the path must begin with “/content/xxx”
- from google.colab import drive
Hardware Spec for Colab
See link.
Linking with Google Sheets (reference from source)
# Step 1 !pip install --upgrade --quiet gspread # Step 2 from google.colab import auth auth.authenticate_user() import gspread from oauth2client.client import GoogleCredentials gc = gspread.authorize(GoogleCredentials.get_application_default()) # Step 3 sh = gc.create('My spreadsheet') worksheet = gc.open('My spreadsheet').sheet1 cell_list = worksheet.range('A1:C2') import random for cell in cell_list: cell.value = random.randint(1, 10) worksheet.update_cells(cell_list)
Note: The google sheets is at the starting page of google Drive. Still figuring out the way to specify target directory.
Linking with Kaggle (eg. direct download and import Kaggle dataset)
- Retrieve API token from Kaggle (Kaggle–> accounts –> under AP, hit “Create New API Token.”
- Save the token.json in Google Drive
- Run the following on colab to link with Kaggle
!pip install kaggle !mkdir .kaggle
from googleapiclient.discovery import build import io, os from googleapiclient.http import MediaIoBaseDownload from google.colab import auth auth.authenticate_user() drive_service = build('drive', 'v3') results = drive_service.files().list( q="name = 'kaggle.json'", fields="files(id)").execute() kaggle_api_key = results.get('files', []) filename = "/content/.kaggle/kaggle.json" os.makedirs(os.path.dirname(filename), exist_ok=True) request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id']) fh = io.FileIO(filename, 'wb') downloader = MediaIoBaseDownload(fh, request) done = False while done is False: status, done = downloader.next_chunk() print("Download %d%%." % int(status.progress() * 100)) os.chmod(filename, 600)
source: https://colab.research.google.com/drive/1JG6d49pAWpn4kF92c0Ko16gQV6hptAro#scrollTo=CSKDTkuLuTY3
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json !kaggle config set -n path -v{/content}
Testing
!kaggle datasets list
Downloading particular data set from Kaggle
- Under particular Kaggle competition, look under Data and get the API commands
- Eg. Some competition data set from Kaggle
- Commands copied from API: kaggle competitions download -c ndsc-advanced
- Modify the Command and run in Colab:
- !kaggle competitions download -c ndsc-advanced -p /content
- Unzip the files:
- !unzip \*.zip
- Open file with pandas:
- import pandas as pd
d = pd.read_csv(‘beauty_data_info_val_competition.csv’)
- import pandas as pd