LINUX GAZETTE

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]

"Linux Gazette...making Linux just a little more fun!"


Learning Perl, part 1

By Ben Okopnik


If you've been using Linux for any length of time, you've surely heard of Perl; probably even run a number of scripts, perhaps without even knowing it. Programs like "inews", "mirror", "debconf", "majordomo", "sirc", and a host of others are written purely in Perl. Taking a quick "zgrep" through the "Packages.gz" file in the Debian distro tells me that 382 of the packages depend on Perl (meaning that a critical part of that package is written in it), and 28 other packages either suggest or recommend it.
 

So, What's It Good For?

"Perl is great at text-processing, and it's great at tying and integrating things together. To a scripting language, all those different elements look the same."
 -- John Ousterhout, author of the Tcl scripting language

"Perl" is supposed to stand for the "Practical Extraction and Report Language". Right: bo-oring, but I guess that's what you've got to have if you're going to convince $HUMONGOUS_CORP to use it. Actually, Larry Wall <larry@wall.org> (the author of Perl) says in the Perl man page: "Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that." Umm... OK, Larry. Not a word out of me.

Perl has been variously referred to as "A scripting language with delusions of full language-hood", "The Swiss Army Chainsaw of Unix", "The duct tape of the Web", and other equally, umm, complimentary names. It has been used to write single-line scripts, fast-executing programs, large projects (Amazon.com's entire editorial production and control system, Netscape's content management and delivery system, the Human Genome Project's DNA sequencing and project management, etc.), and millions of quick programs that do an amazing variety of things.  Perl can also emulate a number of common Unix system utilities (hint: if you're looking at having to learn 'awk', 'sed', 'grep', and 'tr', I'd suggest starting in on Perl, instead. All the functionality, much faster, and you'll never outgrow the capabilities. Sure wish I'd known that, way back when...)

As you would expect of any modern language, Perl allows you to do object-oriented programming. It also handles networking (sockets, etc.), is highly portable (a well-written script will run on Linux, BSD, Solaris, DOS, Win9x, NT, MacOS, OS/2, AmigaOS, VMS, etc. without modification), and has a very short write/debug cycle - since there's no compilation required, you just write the changes and run the script. There's a tremendous wealth of modules (pre-built Perl routines) available to perform just about any task; the Comprehensive Perl Archive Network (CPAN) is one of the best resources a Perl programmer can have.
 

Yeah, But What Is It Really?

Good question. I hope that, after using it for a year or so, you can tell me. A description of anything is a container... and I'm still trying to find one big enough to fit Perl (preferably one with a strong lockable lid.)
 

What Kind Of Things Isn't Perl Good For?

Hmmm. I wouldn't set out to write a GUI word processor, a video game, or a graphic browser in it. Perl can indeed do glitzy front ends via its interfaces to many other languages, so you could do all of those things - but in my opinion, there are more efficient ways to do them in other languages. "To a man with a hammer, every problem looks like a nail" - let the programmer beware!

Note, also, that Perl itself isn't written in Perl; neither is the Linux kernel. Low-level stuff of that sort is best left to C/C++ with some assembler thrown in; 'the right tools for the right job' should be every programmer's motto.
 

One Final Warning Before Pulling The Ripcord

If you know a bit of Perl, and see something in this series that 'Aint The Way I Larned It', just remember Perl's motto: There's More Than One Way To Do It. Usually abbreviated TMTOWDI, and pronounced "tim-today", it is a core philosophy of Perl. Of course, corrections of any obvious errors are welcome.

Those of you who read my earlier series on shell scripting may remember that a script starts out with the so-called "hash-bang" or "shebang" line:

#! /bin/bash

This tells the shell to spawn a subshell in which the following code will be executed by the specified program. This is also what's needed for Perl scripts - the first line must be

#! /usr/bin/perl

or whatever is the correct path to your "perl" executable.

Note the requirements for a hash-bang:

1) It must be the first line in the script.
2) The hash (#) must be the first character on the line, and there
   cannot be anything between it and the bang (!).
3) You must use the absolute path, not just the executable name.

So, let's try writing our first Perl script:


#!/usr/bin/perl
# "goodbye" - a modern, high-angst replacement for "Hello World"

print "Goodbye, cruel world!\n";
unlink $0;

Well... at least it says "goodbye" before going away; Ms. Manners would be proud. What did we do here? Several things that are rather obvious: first, the "hash-bang"; next, a line that tells us what the script does - another thing that carries over from shell scripting, and is an excellent idea (there's no such thing as too many comments in the code!) Next, we print the message using the `print' function. Note the "\n" at the end of the string: Perl does not automatically provide you with a line-feed, so you get to decide whether you want one or not. Also, note the semi-colon at the end of the statement: just like C, Perl demands those, and Woe Betide The Hapless Programmer Who Forgets! Actually, Perl's error checking is pretty good stuff, with relatively readable messages; it's just that semicolons, being statement separators, often cause the error to be reported as being on the next line down. If you're aware of that quirk, it's not a big deal. Better yet, just remember to use the semicolons.

The last line is what does the evil deed of erasing the file - "goodbye cruel world" indeed. The "$0" is simply a reference to the name of the script being executed, and "unlink" does the same thing as "rm". Note that "$0" is a lot more useful than "goodbye" or even "./goodbye" - no matter what the file has been named, "$0" returns that name.
 

Oh, By The Way: Some Code Guidelines

Far be it from me to claim perfection in writing code: on past occasion, I've done "write-only" code that would make anyone trying to read it turn various colors. The thing is, I'm constantly trying to improve - and I'd surely like to see that idea take hold.

Perl treats "white space" - tabs and spaces - with the contempt it deserves, i.e., it's ignored. Because of this, you can structure your Perl code to convey the idea of what it is you're doing. Just to give one quick example:

@boats = ("Aloa", "Cheoy Lee", "Pearson", "Mason", "Swan", "Westsail", "S2", "Petersen", "Hereshoff");  # List of sailboats

Here, we've filled an array called `@boats' with sailboats. OK, that works - but it could be more understandable:

@array = ("Aloa",       # French OSTAR/IOR boat
         "Cheoy Lee",   # Comfortable but expensive
         "Pearson",     # Strong but rather heavy
         "Mason",       # Well designed, but a bit of a pig
         "Swan",        # Classy boat - if you've got the money
         "Westsail",    # Wetsnails are OK, for double-enders
         "S2",          # Nice bay boats - not for offshore use
         "Petersen",    # Steel world cruiser, roomy but slow
         "Hereshoff");  # Fast and gorgeous; cramped and expensive

These habits apply to more than just Perl. Most modern languages allow additional whitespace in order to make the code human-readable. As I go through this series, I'll do my best to demonstrate at least my own version of good coding style; I'd like to encourage everyone to make it a consideration in writing their own code.
 

Variables

In Perl, the focus is "ease of use". It is a so-called "loosely-typed" language, where the variable definitions are not rigidly forced into straightjackets; in fact, there's no way to define a variable that will only hold a positive 32-bit floating point number.

The three types of variables in Perl are scalars, arrays, and hashes. Despite the scary names, they're all rather simple: just variables that contain different arrangements of data.

scalars - numbers, strings, or references
A scalar variable is denoted by a `$' sign, i.e. $num, $joe, $pointer
Examples:
"0.0421", "Joe's glove", memory location "0xA000"

arrays - sequentially-numbered lists of scalars
An array variable is denoted by an `@' sign, i.e. @v, @list, @variable
       Example:
   0 - "Sunday"
   1 - "Monday"
   2 - "Tuesday"
   3 - "Wednesday"
   ...

hashes - key-referenced lists of scalars
Hashes are denoted by '%', i.e. %people, %x, %this_is_a_hash
       Example:
   resident - "Sherlock Holmes"
   addr     - "221B Baker Street"
   code     - "NW1"
   city     - "London"
   country  - "GB"
   job      - "sleuth"
   ...

Note that, while arrays are stored in numerical order, hashes are not - retrieving the first element of a hash will often have nothing to do with the first element you loaded into it. Hash elements are referred to by their keys, not their position in the structure.

Within these three data types, you can contain (or point to) anything you want - and access it easily.

Another important note: $a, @a, and %a are completely unrelated to each other. They are in different name spaces. I am careful not to use visually conflicting names like these in my programs, especially since things like $a[0] (a reference to the 1st element of the @a array) exist - but this is something you should be aware of.
 

Given that variables can contain different types of data - numeric and string - we're going to need operators that work for both types. Perl provides these, and you should remember which type goes with what:

   Operator                   Num     Str
   --------------------------------------
   Equal                      ==      eq
   Not equal                  !=      ne
   Less than                   <      lt
   Greater than                >      gt
   Less than or equal to      <=      le
   Greater than or equal to   >=      ge

Easy mnemonic - when comparing letters (strings), use letters.

Since I like to give concrete examples, here's a way to give yourself gray hair and a nervous breakdown:

#!/usr/bin/perl
# A political evaluation script

$a = "Al";
$b = "George";

if ( $a > $b)   { print "$a would make a better President.\n"; }
if ( $a < $b)   { print "$b would make a better President.\n"; }
if ( $a == $b)  { print "$a or $b, there's no difference...\n"; }

Hm. The output says that there's no difference. This may reflect political reality, but what about our comparisons?... oh yeah. We should have used string operators, huh?


#!/usr/bin/perl
# A political evaluation script

$a = "Al";
$b = "George";

if ( $a gt $b)   { print "$a would make a better President.\n"; }
if ( $a lt $b)   { print "$b would make a better President.\n"; }
if ( $a eq $b)   { print "$a or $b, there's no difference...\n"; }

Now the comparison operators work properly (and the real-world logic is backwards... but I digress.)

By the way - why is it that Perl decided that "Al" was the same as "George" in the first example? Since when do programs have political opinions?

The reason is actually an important one - it has to do with the way that Perl separates "true" from "false". Given that all of our tests - "if", "while", "until", etc. depend on that distinction, we need to understand it.

"0" is false, whether it is a number or a string.
All undefined variables (those that have not had a value assigned to them) are false.
An empty string - "" or '' - is false.
Everything else is true.

All right, here's some tricky stuff - look at these values and decide whether they're true or false:

"00"    "-1"    " "    "5 - 5"

See note [1] at the end of this article for the answers.
 

Another issue that is important is variable interpolation, which is a way of determining whether something in quotes gets interpreted or not. Here's an example:

$name =  'Bessie';
print    'Our cow is named $name.';

Oops. The output reads

Our cow is named $name.

I don't think any self-respecting cow would come if called something like that (I won't even mention the difficulty of pronouncing it.) So, how do we get Bessie to come to us?

# Note the double quotes where the singles used to be!
$name =  'Bessie';
print    "Our cow is named $name.";

Successful animal husbandry (and those of you thinking dirty thoughts, stop it) via Perl. I told you you could do anything.

What if we wanted to print the variable name? Perl makes that easy, too.

$joe =   "Joe";
print    "The variable \$joe contains the value $joe.";

We can print any metacharacter, that is, characters that have a special meaning in Perl, by escaping them - that is, preceding them with a backslash. Take a look at this:

$joe =   "Joe";
print    "The variable \"\$joe\" contains the value \"$joe.\"";

Uh... TMTOWDI:

print    'The variable "$joe" contains the value "', $joe, '".';

Take your pick; just be sure you understand the difference. Note that separate values in the "print" statement take a comma as a separator - without one, it has a completely different meaning, which we'll discuss in a future article.

Before we wrap this up, one important consideration: when creating your script file, always use the "-w" parameter as part of your hash-bang -

#! /usr/bin/perl -w

This will generate warnings and tell you where the problems are in your script. Be sure to use it if you're a Perl beginner... and be doubly sure to use it if you're a Perl expert. The errors don't go away as you progress; they just grow smarter. :)
 

Wrap-up

This time around, we took a bit of a journey, skipping lightly over the rocks and shoals of a simple intro. Next month, we'll get a little deeper into it; perhaps explore arrays and hashes, and maybe dive head-first into the incredibly powerful "regular expressions", or regexes of Perl. My suggestion, meanwhile, is to try a few of the things we've talked about, maybe do a little experimentation on your own - I've found that the best way to learn a language is to hack until I hit the limits of my knowledge, then bring my frustrations to someone who knows more than I do. You can't get any good answers if you don't even know the questions.

Happy Perl hacking!
 

Ben Okopnik
perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156".
"\157\164\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162".
"\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'

Note [1]: All true. None of them fit the "false" categories: "00" is not the same as "0"; neither is "-1". A space, " ", is not the same as an empty string (""), and "5 - 5", unless evaluated, is not "0".

References: "Perl: The Complete Reference", Martin C. Brown

Relevant Perl man pages (available on any pro-Perl-y configured system):

perl      - overview               perlfaq   - frequently asked questions
perltoc   - doc TOC                perldata  - data structures
perlsyn   - syntax                 perlop    - operators and precedence
perlrun   - execution              perlfunc  - builtin functions
perltrap  - traps for the unwary   perlstyle - style guide

"perldoc" and "perldoc -f"


Copyright © 2000, Ben Okopnik.
Copying license http://www.linuxgazette.com/copying.html
Published in Issue 61 of Linux Gazette, January 2001

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]