Have you ever stumbled on a page that you would like to copy-paste all the links from and, darn, they were many? For instance when you get that directory listing and you want to download all the files? Well, i faced the problem these days. I wanted to copy more than 30 consequtive links from a directory listing and i thought that it’s plain stupid doing it by hand. Now, the first thing that popped into my mind was a FireFox plugin. I started looking here and there and every one of those had something i didn’t like. So, then it came to me. I would copy the page source and then use a little Perl script to extract those links. Sounds hard? Well it’s not since Perl is the right thing for this job. So, a little tinkering here and there and this is what i came up with.

#!/usr/local/bin/perl
package MyParser;
use base qw(HTML::Parser);
$prefix = "http://a_site_do_use.com";
sub start {
        my ($self, $tagname, $attr, $attrseq, $origtext) = @_;
        if ($tagname eq 'a') {
                print $prefix.$attr->{ href }."\n";
        }
}
package main;
$file = "urls.txt";
open(URLS, $file);
@lines = ;
close(URLS);
$html = "";
foreach $line (@lines){
        $html .= $line;
}
$parser = MyParser->new;
$parser->parse( $html );

A quick explanation of it is this. We use the HTML parser that Perl brings in. It’s a pretty nifty tool. If you want to use this as is you need to check out two things. One is that the contents of the page to extract the links should be on a file “urls.txt” and the second is that if the URL’s are relative (just like the ones that apache produces on a directory listing) you need to add the full prefix on the “$prefix” variable. If you want to tweak it be my guest. It’s draftly written anyway. If you don’t feel comfortable with code then go for those plugins. They are pretty good. Just not for me.

So, i hope this helps out for you guys as it surely did for me!

PS: I know it can be written more effectively but it works so i’m done tweaking 🙂