Mini Projects

Shorte.st Url Shortener API with Python: Create multiple shorteners at one go (& monetize your links)

A mini project that shortens urls with Shorte.st using python. Shorte.st only provides the “curl” command version of the API. In this post, the command is translated in the form of python requests for easy integration with rest of python scripts and enable multiple urls shortening.

Please note that I have an account with Shorte.st.

Objectives:
1. 1. Create python function to shorten url using Shorte.st
Required Tools:
1. 1. Requests — for handling HTML protocol. Use pip install requests.
  2. Shorte.st account — Shorte.st account to shorten url.
Steps:
1. 1. Retrieve the API token from Shorte.st by going to Link Tools –> Developer API and copy the API token.
  2. Use request.put with the following parameters:
    1. headers containing the API token and user-agent
    2. data which contains the target url to shorten.
  3. Get the response.text which contain the shortened url
  4. Complete! Include shortened url in target sites/twitter/social media etc.

Curl commands as provided by Shorte.st

curl -H "public-api-token: your_api_token" -X PUT -d "urlToShorten=target_url_to_shortened.com" https://api.shorte.st/v1/data/url

Python function to insert to part of your code or as standalone

import os, sys, re
import requests

USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"

def shorten_url(target_url, api_token):
    """
        Function to shorten url (With your shorte.st) account.
        Args:
            target_url (str): url to shorten
            api_token (str): api token str
        Returns:
            shortened_url (str)

    """

    headers = {'user_agent':USER_AGENT, 'public-api-token':api_token}
    data = dict(urlToShorten=target_url)

    url = 'https://api.shorte.st/v1/data/url'

    r= requests.put(url, data, headers= headers)

    shortened_url = re.search('"shortenedUrl":"(.*)"',r.text).group(1)
    shortened_url = shortened_url.replace('\\','')

    return shortened_url

if __name__ == "__main__":

    api_token = 'your_api_token'

    urllist = [
                'https://simply-python.com/2018/07/20/fast-download-images-from-google-image-search-with-python-requests-grequests',
                'https://simply-python.com/2018/04/22/building-a-twitter-bot-with-python'

                ]

    for target_url in urllist:
        shortened_url = shorten_url(target_url, api_token)
        print 'shortened_url: {}'.format(shortened_url)

Results

shortened_url: http://destyy.com/wKqD2s
shortened_url: http://destyy.com/wKqD17

Further notes

If you have some fantastic links to share and hope to monetize your links, you can click on below banner to explore more.
The above script is not meant for spamming with huge amount of urls. Shorte.st will monitor on the quality of the urls be shortened.
An ads-free shortener will be with bit.ly. Please see post on using the bit.ly shortener with python if prefer an alternative.

Package your python code made simple & Fast

A mini project that create the required python packaging template folders, submit to GitHub & enable pip installation.

Objectives:
1. 1. Upload a python project to GitHub and enable py-installable.
Required Tools:
1. 1. Cookie Cutter– for templating. Use pip install cookiecutter.
  2. GitHub account, Github desktop, Git shell — version control, git command line.
  3. PyPI account — for uploading to pypi so a user can just do “pip install your_project”.
Steps:
1. 1. Cookie Cutter to set up the template directory and required folders with relevant docs and files (Readme.md, .gitignore, setup.py etc) for uploading. –> See commands section 1 below.
    - use commands in cmd prompt or Git shell for windows (preferred Git shell if you executing additional git commands in step 2).
  2. Create a folder with same name as the directory name created in step 1 and place the relevant python codes inside.
  3. Use Git commands to upload files to GitHub. The below commands will only work if the repository is first created in your GitHub account. –> See commands section 2 below.
  4. Alternatively, you can use the GUI version for the GitHub instead of command line to submit your project to the repository.
  5. Create a .pypirc in same directory as the setup.py file. This will be used to provide the info to upload to pypi. –> See section 3
  6. Updates:
    1. Ensure setuptools and wheel are up to date and install twine
      - pip install -U setuptools wheel; pip install twine
    2. Package the code
      - python setup.py sdist bdist_wheel
    3. Upload the package
      - twine upload –repository pypi dist/*

Windows Command prompt for step 1

pip install cookiecutter
cookiecutter https://github.com/wdm0006/cookiecutter-pipproject.git
cd projectname

Git Commands for step 3

git init
git add -A
git commit -m 'first commit'
git remote add origin http://repository_url # works only if repository is created in Git. See Git commands for repository url.
git push origin master
git tag {{version}} -m 'adds the version you entered in cookiecutter as the first tag for release, change the version 0.0.1 etc'
git push --tags origin master

.pypirc contents for step 5

[distutils] # this tells distutils what package indexes you can push to
index-servers =
pypi

[pypi]
repository: https://pypi.python.org/pypi
username: {{your_username}}
password: {{your_password}}

Further notes

Most of the commands above are from Will McGinnis’ post and python packaging tutorial
To create an empty file in windows for the .pypirc, use cmd echo >.pypirc
Uploading to PyPI require a verfiied email address else there will be error uploading.
When encounter “fatal: remote origin already exists.”. See link
Basic GIT commands. See link
Updates: uploading packages to pypi using twine. (link)
Making changes to the code and uploading (link)

Update changes to github

git add -A
git commit -m 'whatever'
git push origin master
git tag {{version}} -m 'adds the version you entered in cookiecutter as the first tag for release, change the version 0.0.1 etc'
git push --tags origin master

Update changes to pypi

Simply upload your new code to github, create a new release, then adapt the setup.py file (new download_url — according to your new release tag, new version), then run the setup.py and the twin command again

python setup.py sdist
twine upload dist

Fast Download Images from Google Image search with python requests/grequests

A mini project that highlights the usage of requests and grequests.

Objectives:
1. 1. Download multiple images from Google Image search results.
Required Modules:
1. 1. Requests – for HTTP request
  2. grequests – for easy asynchronous HTTP Requests.
  3. Both can be installed by using pip install requests, grequests
Steps:
1. 1. Retrieve html source from the google image search results.
  2. Retrieve all image url links from above html source. (function: get_image_urls_fr_gs)
  3. Feed the image url list to grequests for multiple downloads (function: dl_imagelist_to_dir)
Breakdown: Steps on grequests implementation.
1. Very similar to requests implementation which instead of using requests. get() use grequests.get() or grequests.post()
2. Create a list of GET or POST actions with different urls as the url parameters. Identify a further action after getting the response e.g. download image to file after the get request.
3. Map the list of get requests to grequests to activate it. e.g. grequests.map(do_stuff, size=x) where x is the number of async https requests. You can choose x for values such as 20, 50, 100 etc.
4. Done !

Below is the complete code.


import os, sys, re
import string
import random
import requests, grequests
from functools import partial
import smallutils as su  #only use for creating folder

USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
headers = { 'User-Agent': USER_AGENT }

def get_image_urls_fr_gs(query_key):
    """
        Get all image url from google image search
        Args:
            query_key: search term as of what is input to search box.
        Returns:
            (list): list of url for respective images.

    """

    query_key = query_key.replace(' ','+')#replace space in query space with +
    tgt_url = 'https://www.google.com.sg/search?q={}&tbm=isch&tbs=sbd:0'.format(query_key)#last part is the sort by relv

    r = requests.get(tgt_url, headers = headers)

    urllist = [n for n in re.findall('"ou":"([a-zA-Z0-9_./:-]+.(?:jpg|jpeg|png))",', r.text)] 

    return urllist

def dl_imagelist_to_dir(urllist, tgt_folder, job_size = 100):
    """
        Download all images from list of url link to tgt dir
        Args:
            urllist: list of the image url retrieved from the google image search
            tgt_folder: dir at which the image is stored
        Kwargs:
            job_size: (int) number of downloads to spawn.

    """
    if len(urllist) == 0:
        print "No links in urllist"
        return

    def dl_file(r, folder_dir, filename, *args, **kwargs):
        fname = os.path.join(folder_dir, filename)
        with open(fname, 'wb') as my_file:
            # Read by 4KB chunks
            for byte_chunk in r.iter_content(chunk_size=1024*10):
                if byte_chunk:
                    my_file.write(byte_chunk)
                    my_file.flush()
                    os.fsync(my_file)

        r.close()

    do_stuff = []
    su.create_folder(tgt_folder)

    for run_num, tgt_url in enumerate(urllist):
        print tgt_url
        # handle the tgt url to be use as basename
        basename = os.path.basename(tgt_url)
        file_name = re.sub('[^A-Za-z0-9.]+', '_', basename ) #prevent special characters in filename

        #handling grequest
        action_item =  grequests.get(tgt_url, hooks={'response': partial(dl_file, folder_dir = tgt_folder, filename=file_name)}, headers= headers,  stream=True)
        do_stuff.append(action_item)

    grequests.map(do_stuff, size=job_size)

def dl_images_fr_gs(query_key, tgt_folder):
    """
        Function to download images from google search

    """
    url_list = get_image_urls_fr_gs(query_key)
    dl_imagelist_to_dir(url_list, tgt_folder, job_size = 100)

if __name__ == "__main__":

    query_key= 'python symbol'
    tgt_folder = r'c:\data\temp\addon'
    dl_images_fr_gs(query_key, tgt_folder)

Further notes

Note that the images download from google search are only those displayed. Additional images which are only shown when “show more results” button is clicked will not be downloaded. To resolve this case:
1. a user can continuously clicked on “show more results”, manually download the html source and run the 2nd function (dl_imagelist_to_dir) on the url list extracted.
2. Use python selenium to download the html source.
Instead of using grequests, request module can be used to download the images sequentially or one by one.
The downloading of files are break into chunks especially for those very big files.
Code can be further extended for downloading other stuff.
Further parameters in the google search url here.