Auto search

YouTube videos download using Python (Part 2)

A continuation from the “Search and download YouTube videos using Python” post with more features added.

The initial project only allows searching of playlists within YouTube and downloading the videos for all the playlist found. The project is expanded with the following features:

  1. Multiple searches of different playlist can be inputted at one go (key in all search phrases in a text file) and automatically download for all videos found relating to the search phrases. Playlist search recommended for search such as songs playlist or online courses (eg.  “Top favorite English songs/Most popular English songs”, “Machine learning Coursera”)
  2. Non playlist search (normal video search); Both single and multiple search can be performed. For normal video search or general topic with less likely chance of being in a playlist. (eg. “Python Machine learning”)
  3. Single video download (directly use Pafy module). User just need to input the video link.
  4. Multiple options: users can limit the number of downloads, include filter count such as popularity, video length limit, download in video or audio format.

The script makes use of Python Pattern module for URL request and DOM object processing. For actual downloading of videos, it utilizes Pafy. Pafy is very comprehensive python module, allowing download in both video and audio format. There are other features of Pafy which is not used in this module.

The full script can be found in the GitHub.

Advertisements

Search and download youtube videos using Python

The following python module allows users to search YouTube videos and download all the videos from the different playlists found within the search. Currently, it is able to search for playlists or collections of videos  and download individual videos from each of the playlists.

For example, searching for “Top English KTV” will scan for all the songs playlists found in the search results and collect the individual songs web link from each playlist to be downloaded locally. Users can choose either to download as video format or as audio format.

The script makes use of Python Pattern module for URL request and DOM object processing. For actual downloading of videos, it utilizes Pafy. Pafy is very comprehensive python module, allowing download in both video and audio format. There are other features of Pafy which is not used in this module.

The following are the main flow of the script.

  1. Form the YouTube search URL with the prefix “https://www.youtube.com/results?search_query=” and the search keyword
  2. Based on the above URL, scrape and get all the urls that are linked to a playlist. The Xpath for the playlist element can be easily obtained using any web browser developer options, inspecting the element and  retrieving the Xpath. The playlist url can be obtained using pattern dom object: ‘dom_object(div ul li a[class=”yt-uix-sessionlink”])’.
  3. Filter the list of extracted link to cater only for URL link starting with “/playlist?“. A typical url for playlist looks something like below:
  4. From the list of playlist, scrape the individual playlist webpage to retrieve the url link for each individual videos. The  playlist element can be retrieved using pattern dom object: ‘dom_object(div ul li a[class=”yt-uix-sessionlink”])’.
  5. Download each individual video/audio to local computer using Pafy module by passing in the video URL to Pafy.

Below is the sample code to download a series of videos.


from youtube_search_and_download import YouTubeHandler

search_key = 'chinese top ktv' #keywords
yy = YouTubeHandler(search_key)
yy.download_as_audio =1 # 1- download as audio format, 0 - download as video
yy.set_num_playlist_to_extract(5) # number of playlist to download

print 'Get all the playlist'
yy.get_playlist_url_list()
print yy.playlist_url_list

## Get all the individual video and title from each of the playlist
yy.get_video_link_fr_all_playlist()
for key in  yy.video_link_title_dict.keys():
    print key, '  ', yy.video_link_title_dict[key]
    print
print

print 'download video'
yy.download_all_videos(dl_limit =200) #number of videos to download.

This is the initial script. There are still work in progress such as option to download individual videos instead of playlist from the search page and catering for multiple search.

The full script can be found in the GitHub.