Reference in Perl

Reference in Perl

Introduction

References are the most important feature of Perl which helps in creating complex data structures. This tutorial is aimed for a complete newbie in Perl. We will start with What references are? What are their benefits? How to use them? Their syntax and examples. Wasting no more time just dive into it!

What is a reference?

If you are familiar with C language then you can consider a reference as a pointer (not literally, because there are no pointers in Perl like C language). A variable can store a reference to some variable/value/array or hash.

Reference to a variable

+----------\\       +----------\\
|     x     >      |     y     >
+----------/       +----------/
      |                  |
+-----------+      +-----------+
|     O-----+----->|   54321   |
+-----------+      +-----------+

Here the variable $x has a reference which points to the value of variable $y (which is 54321).

To create a reference to a value we use the backslash operator (\\). For example:

#!/usr/bin/perl
use warnings;
my $y = 54321;
my $x = \$y;     # $x is a reference to the value of variable $y.
print "$y\n";    # 54321
print "$x\n";    # will print the address where the value 54321 is stored like SCALAR(0x13dcac0)
print "${$x}\n"; # 54321 (dereferencing a scalar variable)
print "$$x";     # 54321 (same as above)

Reference to an array

@a		@{$aref}		# An array
reverse @a	reverse @{$aref}	# Reverse the array
$a[3]		${$aref}[3]		# An element of the array
$a[3] = 17;	${$aref}[3] = 17	# Assigning an element
+----------\\
|     x     >      +----------\\
+----------/       |     y     >
      |            +----------/
+-----------+            |
|     O-----+----->+###########+
+-----------+      +"""""""""""+
                   |    Foo    |
+----------\\       +-----------+
|     z     >      |    ABC    |
+----------/       +-----------+
      |            |    BUZ    |
+-----------+      +-----------+
|     O-----+----->|    VIZ    |
+-----------+      +-----------+

Here the refernce $x is pointing to the array @y, reference $z is pointing to the 4th element of array @y i.e. $y[4].

#!/usr/bin/perl
use warnings;
my @y = ("Foo", "ABC", "BUZ", "VIZ");
my $x = \\@y;         # creating reference of an array
print "$x\\n";        # ARRAY(0x189faf0)
print "@y\\n";        # Foo ABC BUZ VIZ
print "@{$x}\\n";     # Foo ABC BUZ VIZ (dereferencing)
print "@$x\\n";       # Foo ABC BUZ VIZ (You may omit curly brace here)
print "${$x}[1]\\n";  # ABC  (Accessing element at position 1 of array using reference variable)
print "$$x[1]\\n";    # ABC  (Another way of doing the same)
print "$x->[1]";     # ABC  (You can use -> operator too)

Reference to a hash

You can make a reference to hash in the similar way as you created from scalar variables and array.

%h		%{$href}	      # A hash
keys %h		keys %{$href}	      # Get the keys from the hash
$h{'red'}	${$href}{'red'}	      # An element of the hash
$h{'red'} = 17	${$href}{'red'} = 17  # Assigning an element

Loop over an array or hash using reference

If you want to loop over an array then

for my $element (@array) {
           ...
}

If you want to loop over an array using reference then

for my $element (@{$aref}) {
           ...
}

If you want to loop over a hash then

for my $key (keys %hash) {
          print "$key -> $hash{$key}\\n";
}

If you want to loop over a hash using reference then

for my $key (keys %{$href}) {
          print "$key => ${$href}{$key}\\n";
}

Tip

${$aref}[3] is too hard to read, so you can write $aref->[3] instead.
${$href}{red} is too hard to read, so you can write $href->{red} instead.

If $aref holds a reference to an array, then $aref->[3] is the fourth element of the array. Don’t confuse this with $aref[3], which is the fourth element of a totally different array, one deceptively named @aref. $aref and @aref are unrelated the same way that $item and @item are.

Similarly, $href->{‘red’} is part of the hash referred to by the scalar variable $href, perhaps even one with no name. $href{‘red’} is part of the deceptively named %href hash.

It’s easy to forget to leave out the ->, and if you do, you’ll get bizarre results when your program gets array and hash elements out of totally unexpected hashes and arrays that weren’t the ones you wanted to use.

 
With the help of references you can construct complex datastructures as:

1) A reference to a list containing references to several lists

+----------\\
|     x     >                      +-->+###########+
+----------/                       |   +"""""""""""+
      |                            |   |    Foo    |
+-----------+                      |   +-----------+
|     O-----+----->+###########+   |   |    ABC    |
+-----------+      +"""""""""""+   |   +-----------+
                   |     O-----+---+
                   +-----------+
                   |     O-----+------>+###########+
                   +-----------+       +"""""""""""+
                   |     O-----+---+   |    BUZ    |
                   +-----------+   |   +-----------+
                                   |   |    VIZ    |
                                   |   +-----------+
                                   |
                                   |
                                   +-->+###########+
                                       +"""""""""""+
                                       |    BUZ    |
                                       +-----------+

2) A hash containing references to lists

+----------\\
|     x     >                 +-->+###########+
+----------/                  |   +"""""""""""+
      |                       |   |    Foo    |
+#########################+   |   +-----------+
+"""""""""""""""""""""""""+   |   |    ABC    |
+-----------\\ +-----------+   |   +-----------+
|    aaa     >|     O-----+---+
+-----------< +-----------+
|    bbb     >|     O-----+------>+###########+
+-----------< +-----------+       +"""""""""""+
|    ccc     >|     O-----+---+   |    BUZ    |
+-----------/ +-----------+   |   +-----------+
                              |   |    VIZ    |
                              |   +-----------+
                              |
                              |
                              +-->+###########+
                                  +"""""""""""+
                                  |    BUZ    |
                                  +-----------+

Creating reference to an anonymous array

To create an anonymous array [ ] brackets are used. For example:

$x = [1, 2, 3];

Here $x is a reference to an anonymous array.

Creating reference to an anonymous hash

To create an anonymous hash curly braces { } are used. For example:

$x = {Name => 'Chankey', Email => '[email protected]'};

Creating reference to an anonymous subroutine

For this just assign subroutine to a scalar variable (reference variable in our case).

$x = sub { print "Welcome to Linux Stall\n" };

Some notations

Consider the example given below

@a = ( [1, 2, 3],
               [4, 5, 6],
	       [7, 8, 9]
);

@a is an array with three elements, and each one is a reference to another array. $a[1] is one of these references. It refers to an array, the array containing (4, 5, 6), and because it is a reference to an array, we can write $a[1]->[2] to get the third element from that array. $a[1]->[2] is the 6. Similarly, $a[0]->[1] is the 2. What we have here is like a two-dimensional array; you can write $a[ROW]->[COLUMN] to get or set the element in any row and any column of the array.

The notation still looks a little cumbersome, so there’s one more abbreviation:

In between two subscripts, the arrow is optional i.e. instead of $a[1]->[2], we can write $a[1][2]; it means the same thing. Instead of $a[0]->[1] = 23, we can write $a[0][1] = 23 ; it means the same thing. Now it really looks like two-dimensional arrays! You can see why the arrows are important. Without them, we would have had to write ${$a[1]}[2] instead of $a[1][2]. For three-dimensional arrays, they let us write $x[2][3][5] instead of the unreadable ${${$x[2]}[3]}[5].

This is all what a newbie should know about reference in Perl.

How to read a CSV file in Perl?

read csv file perl

Perl is used for manipulating text files and in this article I will teach how to read a CSV file in Perl? First of all let’s start with

What is a CSV file?

A comma-separated values (CSV) file stores tabular data (numbers and text) in plain-text form. As a result, such a file is easily human-readable (e.g., in a text editor).

CSV is a simple file format that is widely supported by consumer, business, and scientific applications. Among its most common uses is to move tabular data between programs that naturally operate on a more efficient or complete proprietary format. For example: a CSV file might be used to transfer information from a database program to a spreadsheet. -Wikipedia

Using Perl to read a CSV file

Now suppose we have a CSV file which has 4 rows and each row has fields which are separated by comma as given below:

Linux,Unix,1,Ubuntu
RHEL,Kubuntu,2,Fedora
Mint,Puppy,3,ArchLinux
Mandriva,OpenSUSE,4,CentOS

If you want to extract out the data from 3rd column and calculate the sum of all the numbers then the approach to do this will be as follows:

Read the first line of the CSV file and extract out the 3rd column. Assign the extracted value to a scalar variable and then read the next line, extract out the value of 3rd column, add it to the scalar variable which contains the first value and store result in the same variable. Then read the 3rd line and so on…

Below is the Perl code for this purpose using split function.

  #!/usr/bin/perl
  use strict;
  use warnings;
 
  my $filename = $ARGV[0];
  my $total = 0;
 
  open(my $FH, '<', $filename) or die "Couldn't open $filename $!";
 
  while (my $fetchline = <$FH>){
    chomp $fetchline;
    my @elements = split "," , $fetchline;
    $total += $elements[2];
  }
 
  print "$total\n";

Save the file as text.plx and if the CSV file name is filename.csv then you may run the code as

$ perl text.plx filename.csv

In this case we get 10 (1+2+3+4) as output.

In this article we used split function to extract out the columns from CSV file using comma (,) as separator. But this code is error prone because of 2 reasons.

Reason 1: If the filename.csv contains something like

Linux,Unix,1,Ubuntu
RHEL,Kubuntu,2,Fedora
Mint,"Puppy,Linux",3,ArchLinux
Mandriva,OpenSUSE,4,CentOS

Notice the 3rd row. The 2nd field of it has a comma “Puppy,Linux”. If we use the same approach of splitting fields by comma separator then it will throw an error because split function will split out Puppy and Linux since they are separated by a comma. Then it will do something like 1+2+Linux+4 which is wrong.

Reason 2: If the filename.csv contains something like

Linux,Unix,1,Ubuntu
RHEL,Kubuntu,2,Fedora
Mint,"Puppy,
Linux",3,ArchLinux
Mandriva,OpenSUSE,4,CentOS

Here the 3rd row contains multi-line fields. When the above code will be used it will treat the CSV file as it contains 5 lines instead of 4.

So we can conclude that if the CSV file is in good format AND if it does not contain field separators within quote AND if there are no multi-line fields in CSV file then our code (which we have written above) will work perfectly and will give correct result. Take use of Text::CSV_XS module which is used for reading and writing CSV files, it will deal with those error prone situations very efficiently.

Types of Filesystem, differences and dealing with them using Perl

If you are looking to write code in Perl which will work on multiple platforms then you must have the knowledge of filesystem of each platform. Filesystems are categoriesed in 3 parts.

1. Filesystem of Unix

FFS (Berkeley Fast File System) is the ancestor of all current filesystems of Unix variants. The filesystem has been extended by different vendors according to their needs. Like for better security they have been extended to provide support for POSIX ACL (access control list).

In Unix filesystem the root is denoted by / (forward slash). The path of any directory in Unix filesystem starts from / and goes deeper in the system. The directory names or file names in Unix filesystem are case-sensitive.

Path Example: /root/home/user/Desktop/file.txt

2. Filesystem of Windows

Operating systems which are based on Windows supports 3 filesystems: FAT (File Allocation Table), NTFS (NT FileSystem), FAT32 (Advance version of FAT).

FAT filesystem is not case-sensitive. It uses \\ (backward slash) as path separator. In the FAT systems the direcotries and files contains some flags, these flags are known as attributes like “Read Only”. NTFS supports unicode feature.

Path Example: C:\\home\\user\\Desktop\\file.txt

3. Filesystem of Mac OS

Classic Mac OS used HFS (Hierarchical File System). The version 8.1 of Mac OS uses HFS+ which is an advance version of HFS. The current filesystem looks similar to Unix filesystem. The method of showing paths is same in both OS. The difference between these is that Unix file system is case-sensitive whereas HFS+ is not case-sensitive.

Summary

OS and FileSystem Path Separator Filename length Absolute path format Relative path format Unique features
Unix (Berkeley FFS and others) / OS-dependent number of chars /dir/file dir/file OS-variant dependent additions
Mac OS (HFS+) / 255 Unicode chars /dir/file dir/file Mac OS legacy support, BSD extended attributes
Windows based OS (NTFS) \\ 255 Unicode chars Drive:\\dir\\file dir\\file File encryption and compression
DOS (basic FAT) \\ 8.3 Drive:\\dir\\file dir\\file Attributes

 

Using Perl to deal with different Filesystems

By now we know all the filesystems differ from each other and to deal with these in Perl we use File::Spec module. This module is used to hide the differences between these filesystems. Let us learn how we use this module. We will use the catfile method to get the path of the file.

Example:

use File::Spec;
my $path= File::Spec->catfile(qw{home user Desktop file.txt});

In this way the scalar variable $path get sets to home\user\desktop\file.txt in a Windows operating system. and in Unix based system it gets set to home/user/Desktop/file.txt. The File::Spec module also contains methods like curdir and updir to get the current directory (.) and up-directory (..).

You may also use the moudle Path::Class which is an another good wrapper. You will have to install it before using.

Below is an example of how it works:

use Path::Class;
my $file = file (qw{home user Desktop file.txt});
my $dir = file (qw{home user Desktop});

The $file and $dir are scalar variable which contains the path to the file.txt file and Desktop directory respectively.

Here’s the path

print $file;
print $dir;

The output will depend on operating system. If it is Unix then it will yield

home/user/desktop/file.txt and home//user//desktop

and in Windows based systems it will give

home\\user\\desktop\\file.txt and home\\user\\desktop

Here $file and $dir are objects which have several methods which can be applied on them for some use. Like absoulte method gives the absolute path and slurp method slurps through the file’s contents.

my $abspath = $file->absolute;
my $content = $file->slurp;
$file->remove(); #delete the file

When you want to write a code for a particular system and you want to make this system understand the path of other operating system then you may use foreign_file() and foreign_dir() methods. These will return the path based the argument which you have to specify explicitly.

use Path::Class qw(foreign_file foreign_dir);
my $foreignfile = foreign_file('Win32', qw{home user Desktop file.txt});
my $foreigndir = foreign_dir('Win32', qw{home user});

Now $foreignfile will contain home\\user\\desktop\\file.txt even if the code is run from a Unix based system. This approach is very handy and useful. In the next article we will continue and expand our discussion.