Wednesday, December 29, 2010

Using the Perl split() function---Splitting into a hash

Introduction

The split() function is used to split a string into smaller sections. You can split a string on a single character, a group of characers or a regular expression (a pattern).
You can also specify how many pieces to split the string into. This is better explained in the examples below.

Example 1. Splitting on a character

A common use of split() is when parsing data from a file or from another program. In this example, we will split the string on the comma ','. Note that you typically should not use split() to parse CSV (comma separated value) files in case there are commas in your data: use Text::CSV instead.
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Becky Alcorn,25,female,Melbourne';

  my @values = split(',', $data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
This program produces the following output:
Becky Alcorn
  25
  female
  Melbourne

Example 2. Splitting on a string

In the same way you use a character to split, you can use a string. In this example, the data is separated by three tildas '~~~'.
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Bob the Builder~~~10:30am~~~1,6~~~ABC';

  my @values = split('~~~', $data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
This outputs:
Bob the Builder
  10:30am
  1,6
  ABC

Example 3. Splitting on a pattern

In some cases, you may want to split the string on a pattern (regular expression) or a type of character. We'll assume here that you know a little about regular expressions. In this example we will split on any integer:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Home1Work2Cafe3Work4Home';

  # \d+ matches one or more integer numbers
  my @values = split(/\d+/, $data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
The output of this program is:
Home
  Work
  Cafe
  Work
  Home

Example 4. Splitting on an undefined value

If you split on an undefined value, the string will be split on every character:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Becky Alcorn';

  my @values = split(undef,$data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
The results of this program are:
B
  e
  c
  k
  y

  A
  l
  c
  o
  r
  n

Example 5. Splitting on a space

If you use a space ' ' to split on, it will actually split on any kind of space including newlines and tabs (regular expression /\s+/) rather than just a space. In this example we print 'aa' either side of the values so we can see where the split took place:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = "Becky\n\nAlcorn";

  my @values = split(' ',$data);

  # Print 'aa' either side of the value, so we can see 
# where it split 
  foreach my $val (@values) {
    print "aa${val}aa\n";
  }

  exit 0;
This produces:
aaBeckyaa
  aaAlcornaa
As you can see, it has split on the newlines that were in our data. If you really want to split on a space, use regular expressions:
my @values = split(/ /,$data);

Example 6. Delimiter at the start of the string

If the delimiter is at the start of the string then the first element in the array of results will be empty. We'll print fixed text with each line so that you can see the blank one:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = ',test,data';

  my @values = split(',',$data);

  # We print "Val: " with each line so that you can 
# see the blank one
  foreach my $val (@values) {
    print "Val: $val\n";
  }

  exit 0;
The output of this program is:
Val: 
  Val: test
  Val: data

Example 7. Split and context

If you do not pass in a string to split, then split() will use $_. If you do not pass an expression or string to split on, then split() will use ' ':
#!/usr/bin/perl

  use strict;
  use warnings;

  foreach ('Bob the Builder', 'Thomas the TankEngine', 
'B1 and B2') {
    my @values = split;
    print "Split $_:\n";
    foreach my $val (@values) {
      print "  $val\n";
    }
  }

  exit 0;
This produces:
Split Bob the Builder:
    Bob
    the
    Builder
  Split Thomas the TankEngine:
    Thomas
    the
    TankEngine
  Split B1 and B2:
    B1
    and
    B2

Example 8. Limiting the split

You can limit the number of sections the string will be split into. You can do this by passing in a positive integer as the third argument. In this example, we're splitting our data into 3 fields - even though there are 4 occurrances of the delimiter:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Becky Alcorn,25,female,Melbourne';

  my @values = split(',', $data, 3);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
This program produces:
Becky Alcorn
  25
  female,Melbourne

Example 9. Keeping the delimiter

Sometimes, when splitting on a pattern, you want the delimiter in the result of the split. You can do this by capturing the characters you want to keep inside parenthesis. Let's do our regular expression example again, but this time we'll keep the numbers in the result:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'Home1Work2Cafe3Work4Home';

  # \d+ matches one or more integer numbers
  # The parenthesis () mean we keep the 
# digits we match
  my @values = split(/(\d+)/, $data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit 0;
The output is:
Home
  1
  Work
  2
  Cafe
  3
  Work
  4
  Home

Example 10. Splitting into a hash

If you know a bit about your data, you could split it directly into a hash instead of an array:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'FIRSTFIELD=1;SECONDFIELD=2;THIRDFIELD=3';

  my %values =  split(/[=;]/, $data);

  foreach my $k (keys %values) {
    print "$k: $values{$k}\n";
  }

  exit 0;
The output of this program is:
FIRSTFIELD: 1
  THIRDFIELD: 3
  SECONDFIELD: 2
The problem is that if the data does not contain exactly what you think, for example FIRSTFIELD=1;SECONDFIELD=2;THIRDFIELD= then you will get an 'Odd number of elements in hash assignment' warning. Here is the output of the same program but with this new data:
Odd number of elements in hash assignment 
at ./test.pl line 8.
  FIRSTFIELD: 1
  Use of uninitialized value in concatenation (.) 
or string at ./test.pl line 11.
  THIRDFIELD:
  SECONDFIELD: 2
SOURCE: http://perlmeme.org/howtos/perlfunc/split_function.html

No comments: