Search and download youtube videos using Python

The following python module allows users to search YouTube videos and download all the videos from the different playlists found within the search. Currently, it is able to search for playlists or collections of videos  and download individual videos from each of the playlists.

For example, searching for “Top English KTV” will scan for all the songs playlists found in the search results and collect the individual songs web link from each playlist to be downloaded locally. Users can choose either to download as video format or as audio format.

The script makes use of Python Pattern module for URL request and DOM object processing. For actual downloading of videos, it utilizes Pafy. Pafy is very comprehensive python module, allowing download in both video and audio format. There are other features of Pafy which is not used in this module.

The following are the main flow of the script.

  1. Form the YouTube search URL with the prefix “https://www.youtube.com/results?search_query=” and the search keyword
  2. Based on the above URL, scrape and get all the urls that are linked to a playlist. The Xpath for the playlist element can be easily obtained using any web browser developer options, inspecting the element and  retrieving the Xpath. The playlist url can be obtained using pattern dom object: ‘dom_object(div ul li a[class=”yt-uix-sessionlink”])’.
  3. Filter the list of extracted link to cater only for URL link starting with “/playlist?“. A typical url for playlist looks something like below:
  4. From the list of playlist, scrape the individual playlist webpage to retrieve the url link for each individual videos. The  playlist element can be retrieved using pattern dom object: ‘dom_object(div ul li a[class=”yt-uix-sessionlink”])’.
  5. Download each individual video/audio to local computer using Pafy module by passing in the video URL to Pafy.

Below is the sample code to download a series of videos.


from youtube_search_and_download import YouTubeHandler

search_key = 'chinese top ktv' #keywords
yy = YouTubeHandler(search_key)
yy.download_as_audio =1 # 1- download as audio format, 0 - download as video
yy.set_num_playlist_to_extract(5) # number of playlist to download

print 'Get all the playlist'
yy.get_playlist_url_list()
print yy.playlist_url_list

## Get all the individual video and title from each of the playlist
yy.get_video_link_fr_all_playlist()
for key in  yy.video_link_title_dict.keys():
    print key, '  ', yy.video_link_title_dict[key]
    print
print

print 'download video'
yy.download_all_videos(dl_limit =200) #number of videos to download.

This is the initial script. There are still work in progress such as option to download individual videos instead of playlist from the search page and catering for multiple search.

The full script can be found in the GitHub.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s