dp.cx blog

Posted on

Filed under computers, software, perl, programming, and perl-module

Parallel::Iterator is a Perl module that can make your scripts much faster if you perform long-running tasks within a loop. It does this by managing a forking system in the background, allowing your long-running tasks to execute in parallel, instead of one at a time. Your existing code probably looks something like this:

 
foreach my $key (@key_list) {
    ### do something very important here
}
 

Now, if your code is like mine, that @key_list array has about 500 entries, and your "important" stuff takes about 10 seconds each. What that means is 5,000 seconds (an hour and a half, give or take) to finish that loop. It's silly. Instead, Parallel::Iterator can help your code run in under a minute. Ok, maybe not a minute. But you get the idea. Like this:

 
use Parallel::Iterator qw/iterate_as_array/;
 
my @done = iterate_as_array( sub {
    ###do something very important here
}, \@key_list);
 

Essentially, you're taking what's in the foreach loop, and creating an anonymous subroutine with it; this is passed to Parallel::Iterator, allowing it to run for each value in @key_list, which is passed by reference.

Obviously, Parallel::Iterator isn't for everyone. If your machine doesn't have multiple CPUs, you probably won't see much improvement. If your "important" bit of code never waits for something else to happen, you probably won't see much improvement. Use it wisely.

EDIT: Using Parallel::Iterator will make it difficult to profile your code. You actually won't see any of the code in your anonymous sub as being executed (because it's executed elsewhere; remember, that's just a reference to it). Keep that in mind.