Wednesday, January 26, 2011

How can I read in an entire file all at once

Are you sure you want to read the entire file and store it in memory?

If you map the file, you can virtually load the entire file into a string without actually storing it in memory:

use File::Map qw(map_file);
map_file my $string, $filename;

Once mapped, you can treat $string as you would any other string. Since you don't necessarily have to load the data, mmap-ing can be very fast and may not increase your memory footprint.

   If you want to load the entire file, you can use the "File::Slurp" module to do it in one one simple and efficient step:

           use File::Slurp;

           my $all_of_it = read_file($filename); # entire file in scalar
           my @all_lines = read_file($filename); # one line per element

   The customary Perl approach for processing all the lines in a file is to
   do so one line at a time:

           open my $input, '<', $file or die "can't open $file: $!";
           while (<$input>) {
                   chomp;
                   # do something with $_
                   }
           close $input or die "can't close $file: $!";

   This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:

           my @lines = <INPUT>;

   You should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard "Tie::File" module, or the "DB_File" module's $DB_RECNO bindings, which allow you to tie an array to a file so that
   accessing an element the array actually accesses the corresponding line in the file.

   You can read the entire filehandle contents into a scalar.

           my $var;
           {
           local $/;
           open my $fh, '<', $file or die "can't open $file: $!";
           $var = <$fh>;
           }

   That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:

           my $var = do { local $/; <$fh> };

   You can do that one better by using a localized @ARGV so you can eliminate the "open":

           my $var = do { local( @ARGV, $/ ) = $file; <> };

   For ordinary files you can also use the "read" function.

           read( $fh, $var, -s $fh );

   That third argument tests the byte size of the data on the "INPUT" filehandle and reads that many bytes into the buffer $var.

No comments: