Pages

Rabu, 25 April 2012

How to automatically find and download videos from YouTube (or anywhere else)

Takeaway: Marco Fioretti shows you how to install two free SW utilities and supplies a sample script to use with them that will allow you automatically search for and download online videos matching the criteria of your choosing.
Online video is good, but finding and, even more, saving it for when you actually have the time to watch it can be a huge time waster. Wouldn’t it be cool if your computer could monitor YouTube or similar portals for you, to download clips on whatever topic you declared as interesting, all by itself?
Of course it is, and here’s a command-line only method to do it, that will work on any Linux distribution, including media servers without browsers or graphical interfaces! I will first explain how to install the two Free SW utilities on which it is based, and then show a sample shell script that uses them to find and grab just the videos you want.

Grake, a video finder

Before downloading video clips, you must know their URLs. For this task, I use Grake, a Perl script that depends on the perl-CGI module, available in the repository of most distributions. Grake itself comes as a compressed tar archive, instead. To install it, run these commands at the prompt (the last one as root):
  tar xf grake-0.1.0.tar.gz
  cd grake-0.1.0/
  perl Makefile.PL
  make
  make install

Movgrab, a video downloader

Movgrab is, according to its home page, a straight C downloader for more than 40 video portals that doesn’t require Perl, Ruby, Python or any other big library or software component. This isn’t strictly true: since movgrab is only available as source code, you must have the gcc compiler already installed, which is not the default anymore on several Linux desktops. Don’t worry however. Compiling and installing movgrab is not difficult! First of all, you can tell your Linux box to install gcc with a few clicks. Then, after you have downloaded the tar archive, unpack it, go into the movgrab directory, and type these commands:
  ./configure
  make
  make install
(as for Grake, at least the last one should be executed as root)

…and how to use them together!

Enough preparation! Here’s a basic script, that you may run periodically as a cron job, that uses grake and movgrab to download all the latest videos on a given topic from YouTube.
   1      #! /bin/bash
   2
   3      SEARCH="$@"
   4      SEARCH=`echo $SEARCH | tr " " "+"`
   5      grake --json "http://www.youtube.com/results?search_query=$SEARCH" > /tmp/videolist
   6
   7      for URL in `grep '"url"' /tmp/videolist | cut -d: -f2-  |tr -d '"'`
   8      do
   9      FORMAT=`movgrab -T $URL 2>&1 | grep ^Formats | perl -e 'while (<>) {chomp; @U = split /, */; } ; $U[-1] =~ s/ .*//; print $U[-1]'`
  10      movgrab -f $FORMAT $URL
  11      done
  12      exit
Let’s say you want to collect and watch all videos about the Arch Linux distribution. You may search and download them manually on YouTube, of course, or (assuming you called the script videograbber) type:
videograbber arch linux
And let the computer do it for you. Our videograbber puts all the terms received as arguments in the $SEARCH variable and tells grake to save (in Json format, easier to parse) inside /tmp/videolist all the video links it finds in the first page of YouTube results (lines 3-5). Here’s how one element of that list will look like (compare it with the YouTube screenshot in Figure A):
  [marco@avalon]$ head /tmp/videolist
  {
    "video": [
      {
        "title": "Arch Linux Review | Linux Action Show | s14e03",
        "url": "http://youtube.com/watch?v=Vm-C_grBuV0"
      },
      {
        "title": "Arch Linux 3.0 CLI",
        "url": "http://youtube.com/watch?v=UdgzM_LfwWw"
      },

Figure A


The for cycle starting at line 7 extracts all the lines containing the url string, and removes everything but the actual $URL. Here’s where the real fun starts. YouTube offers each video in several formats. To know which ones are available, use movgrab with the -T option, which produces this kind of output:
  movgrab  -T http://youtube.com/watch?v=Vm-C_grBuV0
  Formats available for this Movie:webm:1280x720 (1.2G), mp4:1280x720 (734.8M), webm:854x480 (354.7M), flv-h264:854x480 (395.7M), webm:640x360 (152.4M), flv-h264:640x360 (268.7M), mp4:480x360 (268.3M), flv:400x240 (154.5M),
  Selected format item:webm:1280x720
As you can see, by default movgrab chooses the highest definition format available. This may or may not be what you want. The script above, instead, always goes for the smallest possible videos, to save time and disk space. Line 9 uses grep to extract the line of movgrab output that begins with Formats, and a Perl one-liner to put all the formats (separated by “, ” strings) in a temporary array (@U), of which only the last element, that is the format that is fastest to download, is printed, without the file size. Line 10 launches movgrab again, to save the desired version of the video on your drive, where you can watch it in any moment (Figure B). Done!

Figure B


Final notes

As is, the script is rough: it works, but it should check for errors and download the videos, with meaningful file names derived by their titles, in a configurable folder. That’s easy to add, however, once you know how to install and use grake and movgrab. That’s why I preferred to explain the installations instead of a full blown script. Finally, please note that this method is much more flexible than it may seem. Movgrab supports many more websites than grake, but getting the URLs it needs with a bit of web scraping is not a big deal. Try it!