Have you ever stumbled on a page that you would like to copy-paste all the links from and, darn, they were many? For instance when you get that directory listing and you want to download all the files? Well, i faced the problem these days. I wanted to copy more than 30 consequtive links from a directory listing and i thought that it’s plain stupid doing it by hand. Now, the first thing that popped into my mind was a FireFox plugin. I started looking here and there and every one of those had something i didn’t like. So, then it came to me. I would copy the page source and then use a little Perl script to extract those links. Sounds hard? Well it’s not since Perl is the right thing for this job. So, a little tinkering here and there and this is what i came up with.

#!/usr/local/bin/perl
package MyParser;
use base qw(HTML::Parser);
$prefix = "http://a_site_do_use.com";
sub start {
        my ($self, $tagname, $attr, $attrseq, $origtext) = @_;
        if ($tagname eq 'a') {
                print $prefix.$attr->{ href }."\n";
        }
}
package main;
$file = "urls.txt";
open(URLS, $file);
@lines = ;
close(URLS);
$html = "";
foreach $line (@lines){
        $html .= $line;
}
$parser = MyParser->new;
$parser->parse( $html );

A quick explanation of it is this. We use the HTML parser that Perl brings in. It’s a pretty nifty tool. If you want to use this as is you need to check out two things. One is that the contents of the page to extract the links should be on a file “urls.txt” and the second is that if the URL’s are relative (just like the ones that apache produces on a directory listing) you need to add the full prefix on the “$prefix” variable. If you want to tweak it be my guest. It’s draftly written anyway. If you don’t feel comfortable with code then go for those plugins. They are pretty good. Just not for me.

So, i hope this helps out for you guys as it surely did for me!

PS: I know it can be written more effectively but it works so i’m done tweaking 🙂

3 Comments

Raju on January 11, 2009 at 11:23 am

This is cool! But is it not necessary to have perl installed? why don’t you consider to turn this into a plugin? May be for chrome? 😉
Shirley on January 13, 2009 at 3:17 am

Why Perl? 🙄

I’m not much of a Perl buff. And my O’Reilly book which has served as my bible has now grown dusty. 🙂
stratosg on January 13, 2009 at 8:09 pm

@Both: I get the question you both have 🙂 But i didn’t disclose the whole scenario. What i wanted is to use wget on linux to download a list of links. so i used perl to extract those links within the linux environment 🙂 i guess you are right on asking…

Quick URL parsing using Perl

3 Comments

Categories

Archive