diff options
Diffstat (limited to 'contrib/perl5/pod/perlsub.pod')
| -rw-r--r-- | contrib/perl5/pod/perlsub.pod | 1149 | 
1 files changed, 1149 insertions, 0 deletions
| diff --git a/contrib/perl5/pod/perlsub.pod b/contrib/perl5/pod/perlsub.pod new file mode 100644 index 0000000000000..957b3d8ad8131 --- /dev/null +++ b/contrib/perl5/pod/perlsub.pod @@ -0,0 +1,1149 @@ +=head1 NAME + +perlsub - Perl subroutines + +=head1 SYNOPSIS + +To declare subroutines: + +    sub NAME;	      	  # A "forward" declaration. +    sub NAME(PROTO);  	  #  ditto, but with prototypes + +    sub NAME BLOCK    	  # A declaration and a definition. +    sub NAME(PROTO) BLOCK #  ditto, but with prototypes + +To define an anonymous subroutine at runtime: + +    $subref = sub BLOCK;	    # no proto +    $subref = sub (PROTO) BLOCK;    # with proto + +To import subroutines: + +    use PACKAGE qw(NAME1 NAME2 NAME3); + +To call subroutines: + +    NAME(LIST);	   # & is optional with parentheses. +    NAME LIST;	   # Parentheses optional if predeclared/imported. +    &NAME;	   # Makes current @_ visible to called subroutine. + +=head1 DESCRIPTION + +Like many languages, Perl provides for user-defined subroutines.  These +may be located anywhere in the main program, loaded in from other files +via the C<do>, C<require>, or C<use> keywords, or even generated on the +fly using C<eval> or anonymous subroutines (closures).  You can even call +a function indirectly using a variable containing its name or a CODE reference +to it. + +The Perl model for function call and return values is simple: all +functions are passed as parameters one single flat list of scalars, and +all functions likewise return to their caller one single flat list of +scalars.  Any arrays or hashes in these call and return lists will +collapse, losing their identities--but you may always use +pass-by-reference instead to avoid this.  Both call and return lists may +contain as many or as few scalar elements as you'd like.  (Often a +function without an explicit return statement is called a subroutine, but +there's really no difference from the language's perspective.) + +Any arguments passed to the routine come in as the array C<@_>.  Thus if you +called a function with two arguments, those would be stored in C<$_[0]> +and C<$_[1]>.  The array C<@_> is a local array, but its elements are +aliases for the actual scalar parameters.  In particular, if an element +C<$_[0]> is updated, the corresponding argument is updated (or an error +occurs if it is not updatable).  If an argument is an array or hash +element which did not exist when the function was called, that element is +created only when (and if) it is modified or if a reference to it is +taken.  (Some earlier versions of Perl created the element whether or not +it was assigned to.)  Note that assigning to the whole array C<@_> removes +the aliasing, and does not update any arguments. + +The return value of the subroutine is the value of the last expression +evaluated.  Alternatively, a C<return> statement may be used to exit the +subroutine, optionally specifying the returned value, which will be +evaluated in the appropriate context (list, scalar, or void) depending +on the context of the subroutine call.  If you specify no return value, +the subroutine will return an empty list in a list context, an undefined +value in a scalar context, or nothing in a void context.  If you return +one or more arrays and/or hashes, these will be flattened together into +one large indistinguishable list. + +Perl does not have named formal parameters, but in practice all you do is +assign to a C<my()> list of these.  Any variables you use in the function +that aren't declared private are global variables.  For the gory details +on creating private variables, see +L<"Private Variables via my()"> and L<"Temporary Values via local()">. +To create protected environments for a set of functions in a separate +package (and probably a separate file), see L<perlmod/"Packages">. + +Example: + +    sub max { +	my $max = shift(@_); +	foreach $foo (@_) { +	    $max = $foo if $max < $foo; +	} +	return $max; +    } +    $bestday = max($mon,$tue,$wed,$thu,$fri); + +Example: + +    # get a line, combining continuation lines +    #  that start with whitespace + +    sub get_line { +	$thisline = $lookahead;  # GLOBAL VARIABLES!! +	LINE: while (defined($lookahead = <STDIN>)) { +	    if ($lookahead =~ /^[ \t]/) { +		$thisline .= $lookahead; +	    } +	    else { +		last LINE; +	    } +	} +	$thisline; +    } + +    $lookahead = <STDIN>;	# get first line +    while ($_ = get_line()) { +	... +    } + +Use array assignment to a local list to name your formal arguments: + +    sub maybeset { +	my($key, $value) = @_; +	$Foo{$key} = $value unless $Foo{$key}; +    } + +This also has the effect of turning call-by-reference into call-by-value, +because the assignment copies the values.  Otherwise a function is free to +do in-place modifications of C<@_> and change its caller's values. + +    upcase_in($v1, $v2);  # this changes $v1 and $v2 +    sub upcase_in { +	for (@_) { tr/a-z/A-Z/ } +    } + +You aren't allowed to modify constants in this way, of course.  If an +argument were actually literal and you tried to change it, you'd take a +(presumably fatal) exception.   For example, this won't work: + +    upcase_in("frederick"); + +It would be much safer if the C<upcase_in()> function +were written to return a copy of its parameters instead +of changing them in place: + +    ($v3, $v4) = upcase($v1, $v2);  # this doesn't +    sub upcase { +	return unless defined wantarray;  # void context, do nothing +	my @parms = @_; +	for (@parms) { tr/a-z/A-Z/ } +  	return wantarray ? @parms : $parms[0]; +    } + +Notice how this (unprototyped) function doesn't care whether it was passed +real scalars or arrays.  Perl will see everything as one big long flat C<@_> +parameter list.  This is one of the ways where Perl's simple +argument-passing style shines.  The C<upcase()> function would work perfectly +well without changing the C<upcase()> definition even if we fed it things +like this: + +    @newlist   = upcase(@list1, @list2); +    @newlist   = upcase( split /:/, $var ); + +Do not, however, be tempted to do this: + +    (@a, @b)   = upcase(@list1, @list2); + +Because like its flat incoming parameter list, the return list is also +flat.  So all you have managed to do here is stored everything in C<@a> and +made C<@b> an empty list.  See L<Pass by Reference> for alternatives. + +A subroutine may be called using the "C<&>" prefix.  The "C<&>" is optional +in modern Perls, and so are the parentheses if the subroutine has been +predeclared.  (Note, however, that the "C<&>" is I<NOT> optional when +you're just naming the subroutine, such as when it's used as an +argument to C<defined()> or C<undef()>.  Nor is it optional when you want to +do an indirect subroutine call with a subroutine name or reference +using the C<&$subref()> or C<&{$subref}()> constructs.  See L<perlref> +for more on that.) + +Subroutines may be called recursively.  If a subroutine is called using +the "C<&>" form, the argument list is optional, and if omitted, no C<@_> array is +set up for the subroutine: the C<@_> array at the time of the call is +visible to subroutine instead.  This is an efficiency mechanism that +new users may wish to avoid. + +    &foo(1,2,3);	# pass three arguments +    foo(1,2,3);		# the same + +    foo();		# pass a null list +    &foo();		# the same + +    &foo;		# foo() get current args, like foo(@_) !! +    foo;		# like foo() IFF sub foo predeclared, else "foo" + +Not only does the "C<&>" form make the argument list optional, but it also +disables any prototype checking on the arguments you do provide.  This +is partly for historical reasons, and partly for having a convenient way +to cheat if you know what you're doing.  See the section on Prototypes below. + +Function whose names are in all upper case are reserved to the Perl core, +just as are modules whose names are in all lower case.  A function in +all capitals is a loosely-held convention meaning it will be called +indirectly by the run-time system itself.  Functions that do special, +pre-defined things are C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus all the +functions mentioned in L<perltie>.  The 5.005 release adds C<INIT> +to this list. + +=head2 Private Variables via C<my()> + +Synopsis: + +    my $foo;	    	# declare $foo lexically local +    my (@wid, %get); 	# declare list of variables local +    my $foo = "flurp";	# declare $foo lexical, and init it +    my @oof = @bar;	# declare @oof lexical, and init it + +A "C<my>" declares the listed variables to be confined (lexically) to the +enclosing block, conditional (C<if/unless/elsif/else>), loop +(C<for/foreach/while/until/continue>), subroutine, C<eval>, or +C<do/require/use>'d file.  If more than one value is listed, the list +must be placed in parentheses.  All listed elements must be legal lvalues. +Only alphanumeric identifiers may be lexically scoped--magical +builtins like C<$/> must currently be C<local>ize with "C<local>" instead. + +Unlike dynamic variables created by the "C<local>" operator, lexical +variables declared with "C<my>" are totally hidden from the outside world, +including any called subroutines (even if it's the same subroutine called +from itself or elsewhere--every call gets its own copy). + +This doesn't mean that a C<my()> variable declared in a statically +I<enclosing> lexical scope would be invisible.  Only the dynamic scopes +are cut off.   For example, the C<bumpx()> function below has access to the +lexical C<$x> variable because both the my and the sub occurred at the same +scope, presumably the file scope. + +    my $x = 10; +    sub bumpx { $x++ }  + +(An C<eval()>, however, can see the lexical variables of the scope it is +being evaluated in so long as the names aren't hidden by declarations within +the C<eval()> itself.  See L<perlref>.) + +The parameter list to C<my()> may be assigned to if desired, which allows you +to initialize your variables.  (If no initializer is given for a +particular variable, it is created with the undefined value.)  Commonly +this is used to name the parameters to a subroutine.  Examples: + +    $arg = "fred";	  # "global" variable +    $n = cube_root(27); +    print "$arg thinks the root is $n\n"; + fred thinks the root is 3 + +    sub cube_root { +	my $arg = shift;  # name doesn't matter +	$arg **= 1/3; +	return $arg; +    } + +The "C<my>" is simply a modifier on something you might assign to.  So when +you do assign to the variables in its argument list, the "C<my>" doesn't +change whether those variables are viewed as a scalar or an array.  So + +    my ($foo) = <STDIN>;		# WRONG? +    my @FOO = <STDIN>; + +both supply a list context to the right-hand side, while + +    my $foo = <STDIN>; + +supplies a scalar context.  But the following declares only one variable: + +    my $foo, $bar = 1;			# WRONG + +That has the same effect as + +    my $foo; +    $bar = 1; + +The declared variable is not introduced (is not visible) until after +the current statement.  Thus, + +    my $x = $x; + +can be used to initialize the new $x with the value of the old C<$x>, and +the expression + +    my $x = 123 and $x == 123 + +is false unless the old C<$x> happened to have the value C<123>. + +Lexical scopes of control structures are not bounded precisely by the +braces that delimit their controlled blocks; control expressions are +part of the scope, too.  Thus in the loop + +    while (defined(my $line = <>)) { +        $line = lc $line; +    } continue { +        print $line; +    } + +the scope of C<$line> extends from its declaration throughout the rest of +the loop construct (including the C<continue> clause), but not beyond +it.  Similarly, in the conditional + +    if ((my $answer = <STDIN>) =~ /^yes$/i) { +        user_agrees(); +    } elsif ($answer =~ /^no$/i) { +        user_disagrees(); +    } else { +	chomp $answer; +        die "'$answer' is neither 'yes' nor 'no'"; +    } + +the scope of C<$answer> extends from its declaration throughout the rest +of the conditional (including C<elsif> and C<else> clauses, if any), +but not beyond it. + +(None of the foregoing applies to C<if/unless> or C<while/until> +modifiers appended to simple statements.  Such modifiers are not +control structures and have no effect on scoping.) + +The C<foreach> loop defaults to scoping its index variable dynamically +(in the manner of C<local>; see below).  However, if the index +variable is prefixed with the keyword "C<my>", then it is lexically +scoped instead.  Thus in the loop + +    for my $i (1, 2, 3) { +        some_function(); +    } + +the scope of C<$i> extends to the end of the loop, but not beyond it, and +so the value of C<$i> is unavailable in C<some_function()>. + +Some users may wish to encourage the use of lexically scoped variables. +As an aid to catching implicit references to package variables, +if you say + +    use strict 'vars'; + +then any variable reference from there to the end of the enclosing +block must either refer to a lexical variable, or must be fully +qualified with the package name.  A compilation error results +otherwise.  An inner block may countermand this with S<"C<no strict 'vars'>">. + +A C<my()> has both a compile-time and a run-time effect.  At compile time, +the compiler takes notice of it; the principle usefulness of this is to +quiet S<"C<use strict 'vars'>">.  The actual initialization is delayed until +run time, so it gets executed appropriately; every time through a loop, +for example. + +Variables declared with "C<my>" are not part of any package and are therefore +never fully qualified with the package name.  In particular, you're not +allowed to try to make a package variable (or other global) lexical: + +    my $pack::var;	# ERROR!  Illegal syntax +    my $_;		# also illegal (currently) + +In fact, a dynamic variable (also known as package or global variables) +are still accessible using the fully qualified C<::> notation even while a +lexical of the same name is also visible: + +    package main; +    local $x = 10; +    my    $x = 20; +    print "$x and $::x\n"; + +That will print out C<20> and C<10>. + +You may declare "C<my>" variables at the outermost scope of a file to hide +any such identifiers totally from the outside world.  This is similar +to C's static variables at the file level.  To do this with a subroutine +requires the use of a closure (anonymous function with lexical access). +If a block (such as an C<eval()>, function, or C<package>) wants to create +a private subroutine that cannot be called from outside that block, +it can declare a lexical variable containing an anonymous sub reference: + +    my $secret_version = '1.001-beta'; +    my $secret_sub = sub { print $secret_version }; +    &$secret_sub(); + +As long as the reference is never returned by any function within the +module, no outside module can see the subroutine, because its name is not in +any package's symbol table.  Remember that it's not I<REALLY> called +C<$some_pack::secret_version> or anything; it's just C<$secret_version>, +unqualified and unqualifiable. + +This does not work with object methods, however; all object methods have +to be in the symbol table of some package to be found. + +=head2 Peristent Private Variables + +Just because a lexical variable is lexically (also called statically) +scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that +within a function it works like a C static.  It normally works more +like a C auto, but with implicit garbage collection.   + +Unlike local variables in C or C++, Perl's lexical variables don't +necessarily get recycled just because their scope has exited. +If something more permanent is still aware of the lexical, it will +stick around.  So long as something else references a lexical, that +lexical won't be freed--which is as it should be.  You wouldn't want +memory being free until you were done using it, or kept around once you +were done.  Automatic garbage collection takes care of this for you. + +This means that you can pass back or save away references to lexical +variables, whereas to return a pointer to a C auto is a grave error. +It also gives us a way to simulate C's function statics.  Here's a +mechanism for giving a function private variables with both lexical +scoping and a static lifetime.  If you do want to create something like +C's static variables, just enclose the whole function in an extra block, +and put the static variable outside the function but in the block. + +    { +	my $secret_val = 0; +	sub gimme_another { +	    return ++$secret_val; +	} +    } +    # $secret_val now becomes unreachable by the outside +    # world, but retains its value between calls to gimme_another + +If this function is being sourced in from a separate file +via C<require> or C<use>, then this is probably just fine.  If it's +all in the main program, you'll need to arrange for the C<my()> +to be executed early, either by putting the whole block above +your main program, or more likely, placing merely a C<BEGIN> +sub around it to make sure it gets executed before your program +starts to run: + +    sub BEGIN { +	my $secret_val = 0; +	sub gimme_another { +	    return ++$secret_val; +	} +    } + +See L<perlmod/"Package Constructors and Destructors"> about the C<BEGIN> function. + +If declared at the outermost scope, the file scope, then lexicals work +someone like C's file statics.  They are available to all functions in +that same file declared below them, but are inaccessible from outside of +the file.  This is sometimes used in modules to create private variables +for the whole module. + +=head2 Temporary Values via local() + +B<NOTE>: In general, you should be using "C<my>" instead of "C<local>", because +it's faster and safer.  Exceptions to this include the global punctuation +variables, filehandles and formats, and direct manipulation of the Perl +symbol table itself.  Format variables often use "C<local>" though, as do +other variables whose current value must be visible to called +subroutines. + +Synopsis: + +    local $foo;	    		# declare $foo dynamically local +    local (@wid, %get); 	# declare list of variables local +    local $foo = "flurp";	# declare $foo dynamic, and init it +    local @oof = @bar;		# declare @oof dynamic, and init it + +    local *FH;			# localize $FH, @FH, %FH, &FH  ... +    local *merlyn = *randal;	# now $merlyn is really $randal, plus +                                #     @merlyn is really @randal, etc +    local *merlyn = 'randal';	# SAME THING: promote 'randal' to *randal +    local *merlyn = \$randal;   # just alias $merlyn, not @merlyn etc + +A C<local()> modifies its listed variables to be "local" to the enclosing +block, C<eval>, or C<do FILE>--and to I<any subroutine called from within that block>. +A C<local()> just gives temporary values to global (meaning package) +variables.  It does B<not> create a local variable.  This is known as +dynamic scoping.  Lexical scoping is done with "C<my>", which works more +like C's auto declarations. + +If more than one variable is given to C<local()>, they must be placed in +parentheses.  All listed elements must be legal lvalues.  This operator works +by saving the current values of those variables in its argument list on a +hidden stack and restoring them upon exiting the block, subroutine, or +eval.  This means that called subroutines can also reference the local +variable, but not the global one.  The argument list may be assigned to if +desired, which allows you to initialize your local variables.  (If no +initializer is given for a particular variable, it is created with an +undefined value.)  Commonly this is used to name the parameters to a +subroutine.  Examples: + +    for $i ( 0 .. 9 ) { +	$digits{$i} = $i; +    } +    # assume this function uses global %digits hash +    parse_num(); + +    # now temporarily add to %digits hash +    if ($base12) { +	# (NOTE: not claiming this is efficient!) +	local %digits  = (%digits, 't' => 10, 'e' => 11); +	parse_num();  # parse_num gets this new %digits! +    } +    # old %digits restored here + +Because C<local()> is a run-time command, it gets executed every time +through a loop.  In releases of Perl previous to 5.0, this used more stack +storage each time until the loop was exited.  Perl now reclaims the space +each time through, but it's still more efficient to declare your variables +outside the loop. + +A C<local> is simply a modifier on an lvalue expression.  When you assign to +a C<local>ized variable, the C<local> doesn't change whether its list is viewed +as a scalar or an array.  So + +    local($foo) = <STDIN>; +    local @FOO = <STDIN>; + +both supply a list context to the right-hand side, while + +    local $foo = <STDIN>; + +supplies a scalar context. + +A note about C<local()> and composite types is in order.  Something +like C<local(%foo)> works by temporarily placing a brand new hash in +the symbol table.  The old hash is left alone, but is hidden "behind" +the new one. + +This means the old variable is completely invisible via the symbol +table (i.e. the hash entry in the C<*foo> typeglob) for the duration +of the dynamic scope within which the C<local()> was seen.  This +has the effect of allowing one to temporarily occlude any magic on +composite types.  For instance, this will briefly alter a tied +hash to some other implementation: + +    tie %ahash, 'APackage'; +    [...] +    { +       local %ahash; +       tie %ahash, 'BPackage'; +       [..called code will see %ahash tied to 'BPackage'..] +       { +          local %ahash; +          [..%ahash is a normal (untied) hash here..] +       } +    } +    [..%ahash back to its initial tied self again..] + +As another example, a custom implementation of C<%ENV> might look +like this: + +    { +        local %ENV; +        tie %ENV, 'MyOwnEnv'; +        [..do your own fancy %ENV manipulation here..] +    } +    [..normal %ENV behavior here..] + +It's also worth taking a moment to explain what happens when you +C<local>ize a member of a composite type (i.e. an array or hash element). +In this case, the element is C<local>ized I<by name>. This means that +when the scope of the C<local()> ends, the saved value will be +restored to the hash element whose key was named in the C<local()>, or +the array element whose index was named in the C<local()>.  If that +element was deleted while the C<local()> was in effect (e.g. by a +C<delete()> from a hash or a C<shift()> of an array), it will spring +back into existence, possibly extending an array and filling in the +skipped elements with C<undef>.  For instance, if you say + +    %hash = ( 'This' => 'is', 'a' => 'test' ); +    @ary  = ( 0..5 ); +    { +         local($ary[5]) = 6; +         local($hash{'a'}) = 'drill'; +         while (my $e = pop(@ary)) { +             print "$e . . .\n"; +             last unless $e > 3; +         } +         if (@ary) { +             $hash{'only a'} = 'test'; +             delete $hash{'a'}; +         } +    } +    print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; +    print "The array has ",scalar(@ary)," elements: ", +          join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; + +Perl will print + +    6 . . . +    4 . . . +    3 . . . +    This is a test only a test. +    The array has 6 elements: 0, 1, 2, undef, undef, 5 + +=head2 Passing Symbol Table Entries (typeglobs) + +[Note:  The mechanism described in this section was originally the only +way to simulate pass-by-reference in older versions of Perl.  While it +still works fine in modern versions, the new reference mechanism is +generally easier to work with.  See below.] + +Sometimes you don't want to pass the value of an array to a subroutine +but rather the name of it, so that the subroutine can modify the global +copy of it rather than working with a local copy.  In perl you can +refer to all objects of a particular name by prefixing the name +with a star: C<*foo>.  This is often known as a "typeglob", because the +star on the front can be thought of as a wildcard match for all the +funny prefix characters on variables and subroutines and such. + +When evaluated, the typeglob produces a scalar value that represents +all the objects of that name, including any filehandle, format, or +subroutine.  When assigned to, it causes the name mentioned to refer to +whatever "C<*>" value was assigned to it.  Example: + +    sub doubleary { +	local(*someary) = @_; +	foreach $elem (@someary) { +	    $elem *= 2; +	} +    } +    doubleary(*foo); +    doubleary(*bar); + +Note that scalars are already passed by reference, so you can modify +scalar arguments without using this mechanism by referring explicitly +to C<$_[0]> etc.  You can modify all the elements of an array by passing +all the elements as scalars, but you have to use the C<*> mechanism (or +the equivalent reference mechanism) to C<push>, C<pop>, or change the size of +an array.  It will certainly be faster to pass the typeglob (or reference). + +Even if you don't want to modify an array, this mechanism is useful for +passing multiple arrays in a single LIST, because normally the LIST +mechanism will merge all the array values so that you can't extract out +the individual arrays.  For more on typeglobs, see +L<perldata/"Typeglobs and Filehandles">. + +=head2 When to Still Use local() + +Despite the existence of C<my()>, there are still three places where the +C<local()> operator still shines.  In fact, in these three places, you +I<must> use C<local> instead of C<my>. + +=over + +=item 1. You need to give a global variable a temporary value, especially C<$_>. + +The global variables, like C<@ARGV> or the punctuation variables, must be  +C<local>ized with C<local()>.  This block reads in F</etc/motd>, and splits +it up into chunks separated by lines of equal signs, which are placed +in C<@Fields>. + +    { +	local @ARGV = ("/etc/motd"); +        local $/ = undef; +        local $_ = <>;	 +	@Fields = split /^\s*=+\s*$/; +    }  + +It particular, it's important to C<local>ize C<$_> in any routine that assigns +to it.  Look out for implicit assignments in C<while> conditionals. + +=item 2. You need to create a local file or directory handle or a local function. + +A function that needs a filehandle of its own must use C<local()> uses +C<local()> on complete typeglob.   This can be used to create new symbol +table entries: + +    sub ioqueue { +        local  (*READER, *WRITER);    # not my! +        pipe    (READER,  WRITER);    or die "pipe: $!"; +        return (*READER, *WRITER); +    } +    ($head, $tail) = ioqueue(); + +See the Symbol module for a way to create anonymous symbol table +entries. + +Because assignment of a reference to a typeglob creates an alias, this +can be used to create what is effectively a local function, or at least, +a local alias. + +    { +        local *grow = \&shrink; # only until this block exists +        grow();                 # really calls shrink() +	move();			# if move() grow()s, it shrink()s too +    } +    grow();			# get the real grow() again + +See L<perlref/"Function Templates"> for more about manipulating +functions by name in this way. + +=item 3. You want to temporarily change just one element of an array or hash. + +You can C<local>ize just one element of an aggregate.  Usually this +is done on dynamics: + +    { +	local $SIG{INT} = 'IGNORE'; +	funct();			    # uninterruptible +    }  +    # interruptibility automatically restored here + +But it also works on lexically declared aggregates.  Prior to 5.005, +this operation could on occasion misbehave. + +=back + +=head2 Pass by Reference + +If you want to pass more than one array or hash into a function--or +return them from it--and have them maintain their integrity, then +you're going to have to use an explicit pass-by-reference.  Before you +do that, you need to understand references as detailed in L<perlref>. +This section may not make much sense to you otherwise. + +Here are a few simple examples.  First, let's pass in several +arrays to a function and have it C<pop> all of then, return a new +list of all their former last elements: + +    @tailings = popmany ( \@a, \@b, \@c, \@d ); + +    sub popmany { +	my $aref; +	my @retlist = (); +	foreach $aref ( @_ ) { +	    push @retlist, pop @$aref; +	} +	return @retlist; +    } + +Here's how you might write a function that returns a +list of keys occurring in all the hashes passed to it: + +    @common = inter( \%foo, \%bar, \%joe ); +    sub inter { +	my ($k, $href, %seen); # locals +	foreach $href (@_) { +	    while ( $k = each %$href ) { +		$seen{$k}++; +	    } +	} +	return grep { $seen{$_} == @_ } keys %seen; +    } + +So far, we're using just the normal list return mechanism. +What happens if you want to pass or return a hash?  Well, +if you're using only one of them, or you don't mind them +concatenating, then the normal calling convention is ok, although +a little expensive. + +Where people get into trouble is here: + +    (@a, @b) = func(@c, @d); +or +    (%a, %b) = func(%c, %d); + +That syntax simply won't work.  It sets just C<@a> or C<%a> and clears the C<@b> or +C<%b>.  Plus the function didn't get passed into two separate arrays or +hashes: it got one long list in C<@_>, as always. + +If you can arrange for everyone to deal with this through references, it's +cleaner code, although not so nice to look at.  Here's a function that +takes two array references as arguments, returning the two array elements +in order of how many elements they have in them: + +    ($aref, $bref) = func(\@c, \@d); +    print "@$aref has more than @$bref\n"; +    sub func { +	my ($cref, $dref) = @_; +	if (@$cref > @$dref) { +	    return ($cref, $dref); +	} else { +	    return ($dref, $cref); +	} +    } + +It turns out that you can actually do this also: + +    (*a, *b) = func(\@c, \@d); +    print "@a has more than @b\n"; +    sub func { +	local (*c, *d) = @_; +	if (@c > @d) { +	    return (\@c, \@d); +	} else { +	    return (\@d, \@c); +	} +    } + +Here we're using the typeglobs to do symbol table aliasing.  It's +a tad subtle, though, and also won't work if you're using C<my()> +variables, because only globals (well, and C<local()>s) are in the symbol table. + +If you're passing around filehandles, you could usually just use the bare +typeglob, like C<*STDOUT>, but typeglobs references would be better because +they'll still work properly under S<C<use strict 'refs'>>.  For example: + +    splutter(\*STDOUT); +    sub splutter { +	my $fh = shift; +	print $fh "her um well a hmmm\n"; +    } + +    $rec = get_rec(\*STDIN); +    sub get_rec { +	my $fh = shift; +	return scalar <$fh>; +    } + +Another way to do this is using C<*HANDLE{IO}>, see L<perlref> for usage +and caveats. + +If you're planning on generating new filehandles, you could do this: + +    sub openit { +	my $name = shift; +	local *FH; +	return open (FH, $path) ? *FH : undef; +    } + +Although that will actually produce a small memory leak.  See the bottom +of L<perlfunc/open()> for a somewhat cleaner way using the C<IO::Handle> +package. + +=head2 Prototypes + +As of the 5.002 release of perl, if you declare + +    sub mypush (\@@) + +then C<mypush()> takes arguments exactly like C<push()> does.  The declaration +of the function to be called must be visible at compile time.  The prototype +affects only the interpretation of new-style calls to the function, where +new-style is defined as not using the C<&> character.  In other words, +if you call it like a builtin function, then it behaves like a builtin +function.  If you call it like an old-fashioned subroutine, then it +behaves like an old-fashioned subroutine.  It naturally falls out from +this rule that prototypes have no influence on subroutine references +like C<\&foo> or on indirect subroutine calls like C<&{$subref}>. + +Method calls are not influenced by prototypes either, because the +function to be called is indeterminate at compile time, because it depends +on inheritance. + +Because the intent is primarily to let you define subroutines that work +like builtin commands, here are the prototypes for some other functions +that parse almost exactly like the corresponding builtins. + +    Declared as			Called as + +    sub mylink ($$)	     mylink $old, $new +    sub myvec ($$$)	     myvec $var, $offset, 1 +    sub myindex ($$;$)	     myindex &getstring, "substr" +    sub mysyswrite ($$$;$)   mysyswrite $buf, 0, length($buf) - $off, $off +    sub myreverse (@)	     myreverse $a, $b, $c +    sub myjoin ($@)	     myjoin ":", $a, $b, $c +    sub mypop (\@)	     mypop @array +    sub mysplice (\@$$@)     mysplice @array, @array, 0, @pushme +    sub mykeys (\%)	     mykeys %{$hashref} +    sub myopen (*;$)	     myopen HANDLE, $name +    sub mypipe (**)	     mypipe READHANDLE, WRITEHANDLE +    sub mygrep (&@)	     mygrep { /foo/ } $a, $b, $c +    sub myrand ($)	     myrand 42 +    sub mytime ()	     mytime + +Any backslashed prototype character represents an actual argument +that absolutely must start with that character.  The value passed +to the subroutine (as part of C<@_>) will be a reference to the +actual argument given in the subroutine call, obtained by applying +C<\> to that argument. + +Unbackslashed prototype characters have special meanings.  Any +unbackslashed C<@> or C<%> eats all the rest of the arguments, and forces +list context.  An argument represented by C<$> forces scalar context.  An +C<&> requires an anonymous subroutine, which, if passed as the first +argument, does not require the "C<sub>" keyword or a subsequent comma.  A +C<*> does whatever it has to do to turn the argument into a reference to a +symbol table entry. + +A semicolon separates mandatory arguments from optional arguments. +(It is redundant before C<@> or C<%>.) + +Note how the last three examples above are treated specially by the parser. +C<mygrep()> is parsed as a true list operator, C<myrand()> is parsed as a +true unary operator with unary precedence the same as C<rand()>, and +C<mytime()> is truly without arguments, just like C<time()>.  That is, if you +say + +    mytime +2; + +you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed +without the prototype. + +The interesting thing about C<&> is that you can generate new syntax with it: + +    sub try (&@) { +	my($try,$catch) = @_; +	eval { &$try }; +	if ($@) { +	    local $_ = $@; +	    &$catch; +	} +    } +    sub catch (&) { $_[0] } + +    try { +	die "phooey"; +    } catch { +	/phooey/ and print "unphooey\n"; +    }; + +That prints C<"unphooey">.  (Yes, there are still unresolved +issues having to do with the visibility of C<@_>.  I'm ignoring that +question for the moment.  (But note that if we make C<@_> lexically +scoped, those anonymous subroutines can act like closures... (Gee, +is this sounding a little Lispish?  (Never mind.)))) + +And here's a reimplementation of C<grep>: + +    sub mygrep (&@) { +	my $code = shift; +	my @result; +	foreach $_ (@_) { +	    push(@result, $_) if &$code; +	} +	@result; +    } + +Some folks would prefer full alphanumeric prototypes.  Alphanumerics have +been intentionally left out of prototypes for the express purpose of +someday in the future adding named, formal parameters.  The current +mechanism's main goal is to let module writers provide better diagnostics +for module users.  Larry feels the notation quite understandable to Perl +programmers, and that it will not intrude greatly upon the meat of the +module, nor make it harder to read.  The line noise is visually +encapsulated into a small pill that's easy to swallow. + +It's probably best to prototype new functions, not retrofit prototyping +into older ones.  That's because you must be especially careful about +silent impositions of differing list versus scalar contexts.  For example, +if you decide that a function should take just one parameter, like this: + +    sub func ($) { +	my $n = shift; +	print "you gave me $n\n"; +    } + +and someone has been calling it with an array or expression +returning a list: + +    func(@foo); +    func( split /:/ ); + +Then you've just supplied an automatic C<scalar()> in front of their +argument, which can be more than a bit surprising.  The old C<@foo> +which used to hold one thing doesn't get passed in.  Instead, +the C<func()> now gets passed in C<1>, that is, the number of elements +in C<@foo>.  And the C<split()> gets called in a scalar context and +starts scribbling on your C<@_> parameter list. + +This is all very powerful, of course, and should be used only in moderation +to make the world a better place. + +=head2 Constant Functions + +Functions with a prototype of C<()> are potential candidates for +inlining.  If the result after optimization and constant folding is +either a constant or a lexically-scoped scalar which has no other +references, then it will be used in place of function calls made +without C<&> or C<do>. Calls made using C<&> or C<do> are never +inlined.  (See F<constant.pm> for an easy way to declare most +constants.) + +The following functions would all be inlined: + +    sub pi ()		{ 3.14159 }		# Not exact, but close. +    sub PI ()		{ 4 * atan2 1, 1 }	# As good as it gets, +						# and it's inlined, too! +    sub ST_DEV ()	{ 0 } +    sub ST_INO ()	{ 1 } + +    sub FLAG_FOO ()	{ 1 << 8 } +    sub FLAG_BAR ()	{ 1 << 9 } +    sub FLAG_MASK ()	{ FLAG_FOO | FLAG_BAR } + +    sub OPT_BAZ ()	{ not (0x1B58 & FLAG_MASK) } +    sub BAZ_VAL () { +	if (OPT_BAZ) { +	    return 23; +	} +	else { +	    return 42; +	} +    } + +    sub N () { int(BAZ_VAL) / 3 } +    BEGIN { +	my $prod = 1; +	for (1..N) { $prod *= $_ } +	sub N_FACTORIAL () { $prod } +    } + +If you redefine a subroutine that was eligible for inlining, you'll get +a mandatory warning.  (You can use this warning to tell whether or not a +particular subroutine is considered constant.)  The warning is +considered severe enough not to be optional because previously compiled +invocations of the function will still be using the old value of the +function.  If you need to be able to redefine the subroutine you need to +ensure that it isn't inlined, either by dropping the C<()> prototype +(which changes the calling semantics, so beware) or by thwarting the +inlining mechanism in some other way, such as + +    sub not_inlined () { +    	23 if $]; +    } + +=head2 Overriding Builtin Functions + +Many builtin functions may be overridden, though this should be tried +only occasionally and for good reason.  Typically this might be +done by a package attempting to emulate missing builtin functionality +on a non-Unix system. + +Overriding may be done only by importing the name from a +module--ordinary predeclaration isn't good enough.  However, the +C<subs> pragma (compiler directive) lets you, in effect, predeclare subs +via the import syntax, and these names may then override the builtin ones: + +    use subs 'chdir', 'chroot', 'chmod', 'chown'; +    chdir $somewhere; +    sub chdir { ... } + +To unambiguously refer to the builtin form, one may precede the +builtin name with the special package qualifier C<CORE::>.  For example, +saying C<CORE::open()> will always refer to the builtin C<open()>, even +if the current package has imported some other subroutine called +C<&open()> from elsewhere. + +Library modules should not in general export builtin names like "C<open>" +or "C<chdir>" as part of their default C<@EXPORT> list, because these may +sneak into someone else's namespace and change the semantics unexpectedly. +Instead, if the module adds the name to the C<@EXPORT_OK> list, then it's +possible for a user to import the name explicitly, but not implicitly. +That is, they could say + +    use Module 'open'; + +and it would import the C<open> override, but if they said + +    use Module; + +they would get the default imports without the overrides. + +The foregoing mechanism for overriding builtins is restricted, quite +deliberately, to the package that requests the import.  There is a second +method that is sometimes applicable when you wish to override a builtin +everywhere, without regard to namespace boundaries.  This is achieved by +importing a sub into the special namespace C<CORE::GLOBAL::>.  Here is an +example that quite brazenly replaces the C<glob> operator with something +that understands regular expressions. + +    package REGlob; +    require Exporter; +    @ISA = 'Exporter'; +    @EXPORT_OK = 'glob'; + +    sub import { +	my $pkg = shift; +	return unless @_; +	my $sym = shift; +	my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); +	$pkg->export($where, $sym, @_); +    } + +    sub glob { +	my $pat = shift; +	my @got; +	local(*D); +	if (opendir D, '.') { @got = grep /$pat/, readdir D; closedir D; } +	@got; +    } +    1; + +And here's how it could be (ab)used: + +    #use REGlob 'GLOBAL_glob';	    # override glob() in ALL namespaces +    package Foo; +    use REGlob 'glob';		    # override glob() in Foo:: only +    print for <^[a-z_]+\.pm\$>;	    # show all pragmatic modules + +Note that the initial comment shows a contrived, even dangerous example. +By overriding C<glob> globally, you would be forcing the new (and +subversive) behavior for the C<glob> operator for B<every> namespace, +without the complete cognizance or cooperation of the modules that own +those namespaces.  Naturally, this should be done with extreme caution--if +it must be done at all. + +The C<REGlob> example above does not implement all the support needed to +cleanly override perl's C<glob> operator.  The builtin C<glob> has +different behaviors depending on whether it appears in a scalar or list +context, but our C<REGlob> doesn't.  Indeed, many perl builtins have such +context sensitive behaviors, and these must be adequately supported by +a properly written override.  For a fully functional example of overriding +C<glob>, study the implementation of C<File::DosGlob> in the standard +library. + + +=head2 Autoloading + +If you call a subroutine that is undefined, you would ordinarily get an +immediate fatal error complaining that the subroutine doesn't exist. +(Likewise for subroutines being used as methods, when the method +doesn't exist in any base class of the class package.) If, +however, there is an C<AUTOLOAD> subroutine defined in the package or +packages that were searched for the original subroutine, then that +C<AUTOLOAD> subroutine is called with the arguments that would have been +passed to the original subroutine.  The fully qualified name of the +original subroutine magically appears in the C<$AUTOLOAD> variable in the +same package as the C<AUTOLOAD> routine.  The name is not passed as an +ordinary argument because, er, well, just because, that's why... + +Most C<AUTOLOAD> routines will load in a definition for the subroutine in +question using eval, and then execute that subroutine using a special +form of "goto" that erases the stack frame of the C<AUTOLOAD> routine +without a trace.  (See the standard C<AutoLoader> module, for example.) +But an C<AUTOLOAD> routine can also just emulate the routine and never +define it.   For example, let's pretend that a function that wasn't defined +should just call C<system()> with those arguments.  All you'd do is this: + +    sub AUTOLOAD { +	my $program = $AUTOLOAD; +	$program =~ s/.*:://; +	system($program, @_); +    } +    date(); +    who('am', 'i'); +    ls('-l'); + +In fact, if you predeclare the functions you want to call that way, you don't +even need the parentheses: + +    use subs qw(date who ls); +    date; +    who "am", "i"; +    ls -l; + +A more complete example of this is the standard Shell module, which +can treat undefined subroutine calls as calls to Unix programs. + +Mechanisms are available for modules writers to help split the modules +up into autoloadable files.  See the standard AutoLoader module +described in L<AutoLoader> and in L<AutoSplit>, the standard +SelfLoader modules in L<SelfLoader>, and the document on adding C +functions to perl code in L<perlxs>. + +=head1 SEE ALSO + +See L<perlref> for more about references and closures.  See L<perlxs> if +you'd like to learn about calling C subroutines from perl.  See L<perlmod> +to learn about bundling up your functions in separate files. | 
