diff options
Diffstat (limited to 'contrib/perl5/pod/perldata.pod')
-rw-r--r-- | contrib/perl5/pod/perldata.pod | 603 |
1 files changed, 603 insertions, 0 deletions
diff --git a/contrib/perl5/pod/perldata.pod b/contrib/perl5/pod/perldata.pod new file mode 100644 index 000000000000..58c11234b421 --- /dev/null +++ b/contrib/perl5/pod/perldata.pod @@ -0,0 +1,603 @@ +=head1 NAME + +perldata - Perl data types + +=head1 DESCRIPTION + +=head2 Variable names + +Perl has three data structures: scalars, arrays of scalars, and +associative arrays of scalars, known as "hashes". Normal arrays are +indexed by number, starting with 0. (Negative subscripts count from +the end.) Hash arrays are indexed by string. + +Values are usually referred to by name (or through a named reference). +The first character of the name tells you to what sort of data +structure it refers. The rest of the name tells you the particular +value to which it refers. Most often, it consists of a single +I<identifier>, that is, a string beginning with a letter or underscore, +and containing letters, underscores, and digits. In some cases, it +may be a chain of identifiers, separated by C<::> (or by C<'>, but +that's deprecated); all but the last are interpreted as names of +packages, to locate the namespace in which to look +up the final identifier (see L<perlmod/Packages> for details). +It's possible to substitute for a simple identifier an expression +that produces a reference to the value at runtime; this is +described in more detail below, and in L<perlref>. + +There are also special variables whose names don't follow these +rules, so that they don't accidentally collide with one of your +normal variables. Strings that match parenthesized parts of a +regular expression are saved under names containing only digits after +the C<$> (see L<perlop> and L<perlre>). In addition, several special +variables that provide windows into the inner working of Perl have names +containing punctuation characters (see L<perlvar>). + +Scalar values are always named with '$', even when referring to a scalar +that is part of an array. It works like the English word "the". Thus +we have: + + $days # the simple scalar value "days" + $days[28] # the 29th element of array @days + $days{'Feb'} # the 'Feb' value from hash %days + $#days # the last index of array @days + +but entire arrays or array slices are denoted by '@', which works much like +the word "these" or "those": + + @days # ($days[0], $days[1],... $days[n]) + @days[3,4,5] # same as @days[3..5] + @days{'a','c'} # same as ($days{'a'},$days{'c'}) + +and entire hashes are denoted by '%': + + %days # (key1, val1, key2, val2 ...) + +In addition, subroutines are named with an initial '&', though this is +optional when it's otherwise unambiguous (just as "do" is often +redundant in English). Symbol table entries can be named with an +initial '*', but you don't really care about that yet. + +Every variable type has its own namespace. You can, without fear of +conflict, use the same name for a scalar variable, an array, or a hash +(or, for that matter, a filehandle, a subroutine name, or a label). +This means that $foo and @foo are two different variables. It also +means that C<$foo[1]> is a part of @foo, not a part of $foo. This may +seem a bit weird, but that's okay, because it is weird. + +Because variable and array references always start with '$', '@', or '%', +the "reserved" words aren't in fact reserved with respect to variable +names. (They ARE reserved with respect to labels and filehandles, +however, which don't have an initial special character. You can't have +a filehandle named "log", for instance. Hint: you could say +C<open(LOG,'logfile')> rather than C<open(log,'logfile')>. Using uppercase +filehandles also improves readability and protects you from conflict +with future reserved words.) Case I<IS> significant--"FOO", "Foo", and +"foo" are all different names. Names that start with a letter or +underscore may also contain digits and underscores. + +It is possible to replace such an alphanumeric name with an expression +that returns a reference to an object of that type. For a description +of this, see L<perlref>. + +Names that start with a digit may contain only more digits. Names +that do not start with a letter, underscore, or digit are limited to +one character, e.g., C<$%> or C<$$>. (Most of these one character names +have a predefined significance to Perl. For instance, C<$$> is the +current process id.) + +=head2 Context + +The interpretation of operations and values in Perl sometimes depends +on the requirements of the context around the operation or value. +There are two major contexts: scalar and list. Certain operations +return list values in contexts wanting a list, and scalar values +otherwise. (If this is true of an operation it will be mentioned in +the documentation for that operation.) In other words, Perl overloads +certain operations based on whether the expected return value is +singular or plural. (Some words in English work this way, like "fish" +and "sheep".) + +In a reciprocal fashion, an operation provides either a scalar or a +list context to each of its arguments. For example, if you say + + int( <STDIN> ) + +the integer operation provides a scalar context for the E<lt>STDINE<gt> +operator, which responds by reading one line from STDIN and passing it +back to the integer operation, which will then find the integer value +of that line and return that. If, on the other hand, you say + + sort( <STDIN> ) + +then the sort operation provides a list context for E<lt>STDINE<gt>, which +will proceed to read every line available up to the end of file, and +pass that list of lines back to the sort routine, which will then +sort those lines and return them as a list to whatever the context +of the sort was. + +Assignment is a little bit special in that it uses its left argument to +determine the context for the right argument. Assignment to a scalar +evaluates the righthand side in a scalar context, while assignment to +an array or array slice evaluates the righthand side in a list +context. Assignment to a list also evaluates the righthand side in a +list context. + +User defined subroutines may choose to care whether they are being +called in a scalar or list context, but most subroutines do not +need to care, because scalars are automatically interpolated into +lists. See L<perlfunc/wantarray>. + +=head2 Scalar values + +All data in Perl is a scalar or an array of scalars or a hash of scalars. +Scalar variables may contain various kinds of singular data, such as +numbers, strings, and references. In general, conversion from one form to +another is transparent. (A scalar may not contain multiple values, but +may contain a reference to an array or hash containing multiple values.) +Because of the automatic conversion of scalars, operations, and functions +that return scalars don't need to care (and, in fact, can't care) whether +the context is looking for a string or a number. + +Scalars aren't necessarily one thing or another. There's no place to +declare a scalar variable to be of type "string", or of type "number", or +type "filehandle", or anything else. Perl is a contextually polymorphic +language whose scalars can be strings, numbers, or references (which +includes objects). While strings and numbers are considered pretty +much the same thing for nearly all purposes, references are strongly-typed +uncastable pointers with builtin reference-counting and destructor +invocation. + +A scalar value is interpreted as TRUE in the Boolean sense if it is not +the null string or the number 0 (or its string equivalent, "0"). The +Boolean context is just a special kind of scalar context. + +There are actually two varieties of null scalars: defined and +undefined. Undefined null scalars are returned when there is no real +value for something, such as when there was an error, or at end of +file, or when you refer to an uninitialized variable or element of an +array. An undefined null scalar may become defined the first time you +use it as if it were defined, but prior to that you can use the +defined() operator to determine whether the value is defined or not. + +To find out whether a given string is a valid nonzero number, it's usually +enough to test it against both numeric 0 and also lexical "0" (although +this will cause B<-w> noises). That's because strings that aren't +numbers count as 0, just as they do in B<awk>: + + if ($str == 0 && $str ne "0") { + warn "That doesn't look like a number"; + } + +That's usually preferable because otherwise you won't treat IEEE notations +like C<NaN> or C<Infinity> properly. At other times you might prefer to +use the POSIX::strtod function or a regular expression to check whether +data is numeric. See L<perlre> for details on regular expressions. + + warn "has nondigits" if /\D/; + warn "not a natural number" unless /^\d+$/; # rejects -3 + warn "not an integer" unless /^-?\d+$/; # rejects +3 + warn "not an integer" unless /^[+-]?\d+$/; + warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 + warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; + warn "not a C float" + unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/; + +The length of an array is a scalar value. You may find the length of +array @days by evaluating C<$#days>, as in B<csh>. (Actually, it's not +the length of the array, it's the subscript of the last element, because +there is (ordinarily) a 0th element.) Assigning to C<$#days> changes the +length of the array. Shortening an array by this method destroys +intervening values. Lengthening an array that was previously shortened +I<NO LONGER> recovers the values that were in those elements. (It used to +in Perl 4, but we had to break this to make sure destructors were +called when expected.) You can also gain some miniscule measure of efficiency by +pre-extending an array that is going to get big. (You can also extend +an array by assigning to an element that is off the end of the array.) +You can truncate an array down to nothing by assigning the null list () +to it. The following are equivalent: + + @whatever = (); + $#whatever = -1; + +If you evaluate a named array in a scalar context, it returns the length of +the array. (Note that this is not true of lists, which return the +last value, like the C comma operator, nor of built-in functions, which return +whatever they feel like returning.) The following is always true: + + scalar(@whatever) == $#whatever - $[ + 1; + +Version 5 of Perl changed the semantics of C<$[>: files that don't set +the value of C<$[> no longer need to worry about whether another +file changed its value. (In other words, use of C<$[> is deprecated.) +So in general you can assume that + + scalar(@whatever) == $#whatever + 1; + +Some programmers choose to use an explicit conversion so nothing's +left to doubt: + + $element_count = scalar(@whatever); + +If you evaluate a hash in a scalar context, it returns a value that is +true if and only if the hash contains any key/value pairs. (If there +are any key/value pairs, the value returned is a string consisting of +the number of used buckets and the number of allocated buckets, separated +by a slash. This is pretty much useful only to find out whether Perl's +(compiled in) hashing algorithm is performing poorly on your data set. +For example, you stick 10,000 things in a hash, but evaluating %HASH in +scalar context reveals "1/16", which means only one out of sixteen buckets +has been touched, and presumably contains all 10,000 of your items. This +isn't supposed to happen.) + +You can preallocate space for a hash by assigning to the keys() function. +This rounds up the allocated bucked to the next power of two: + + keys(%users) = 1000; # allocate 1024 buckets + +=head2 Scalar value constructors + +Numeric literals are specified in any of the customary floating point or +integer formats: + + 12345 + 12345.67 + .23E-10 + 0xffff # hex + 0377 # octal + 4_294_967_296 # underline for legibility + +String literals are usually delimited by either single or double +quotes. They work much like shell quotes: double-quoted string +literals are subject to backslash and variable substitution; +single-quoted strings are not (except for "C<\'>" and "C<\\>"). +The usual Unix backslash rules apply for making characters such as +newline, tab, etc., as well as some more exotic forms. See +L<perlop/Quote and Quotelike Operators> for a list. + +Octal or hex representations in string literals (e.g. '0xffff') are not +automatically converted to their integer representation. The hex() and +oct() functions make these conversions for you. See L<perlfunc/hex> and +L<perlfunc/oct> for more details. + +You can also embed newlines directly in your strings, i.e., they can end +on a different line than they begin. This is nice, but if you forget +your trailing quote, the error will not be reported until Perl finds +another line containing the quote character, which may be much further +on in the script. Variable substitution inside strings is limited to +scalar variables, arrays, and array slices. (In other words, +names beginning with $ or @, followed by an optional bracketed +expression as a subscript.) The following code segment prints out "The +price is $Z<>100." + + $Price = '$100'; # not interpreted + print "The price is $Price.\n"; # interpreted + +As in some shells, you can put curly brackets around the name to +delimit it from following alphanumerics. In fact, an identifier +within such curlies is forced to be a string, as is any single +identifier within a hash subscript. Our earlier example, + + $days{'Feb'} + +can be written as + + $days{Feb} + +and the quotes will be assumed automatically. But anything more complicated +in the subscript will be interpreted as an expression. + +Note that a +single-quoted string must be separated from a preceding word by a +space, because single quote is a valid (though deprecated) character in +a variable name (see L<perlmod/Packages>). + +Three special literals are __FILE__, __LINE__, and __PACKAGE__, which +represent the current filename, line number, and package name at that +point in your program. They may be used only as separate tokens; they +will not be interpolated into strings. If there is no current package +(due to an empty C<package;> directive), __PACKAGE__ is the undefined value. + +The tokens __END__ and __DATA__ may be used to indicate the logical end +of the script before the actual end of file. Any following text is +ignored, but may be read via a DATA filehandle: main::DATA for __END__, +or PACKNAME::DATA (where PACKNAME is the current package) for __DATA__. +The two control characters ^D and ^Z are synonyms for __END__ (or +__DATA__ in a module). See L<SelfLoader> for more description of +__DATA__, and an example of its use. Note that you cannot read from the +DATA filehandle in a BEGIN block: the BEGIN block is executed as soon as +it is seen (during compilation), at which point the corresponding +__DATA__ (or __END__) token has not yet been seen. + +A word that has no other interpretation in the grammar will +be treated as if it were a quoted string. These are known as +"barewords". As with filehandles and labels, a bareword that consists +entirely of lowercase letters risks conflict with future reserved +words, and if you use the B<-w> switch, Perl will warn you about any +such words. Some people may wish to outlaw barewords entirely. If you +say + + use strict 'subs'; + +then any bareword that would NOT be interpreted as a subroutine call +produces a compile-time error instead. The restriction lasts to the +end of the enclosing block. An inner block may countermand this +by saying C<no strict 'subs'>. + +Array variables are interpolated into double-quoted strings by joining all +the elements of the array with the delimiter specified in the C<$"> +variable (C<$LIST_SEPARATOR> in English), space by default. The following +are equivalent: + + $temp = join($",@ARGV); + system "echo $temp"; + + system "echo @ARGV"; + +Within search patterns (which also undergo double-quotish substitution) +there is a bad ambiguity: Is C</$foo[bar]/> to be interpreted as +C</${foo}[bar]/> (where C<[bar]> is a character class for the regular +expression) or as C</${foo[bar]}/> (where C<[bar]> is the subscript to array +@foo)? If @foo doesn't otherwise exist, then it's obviously a +character class. If @foo exists, Perl takes a good guess about C<[bar]>, +and is almost always right. If it does guess wrong, or if you're just +plain paranoid, you can force the correct interpretation with curly +brackets as above. + +A line-oriented form of quoting is based on the shell "here-doc" +syntax. Following a C<E<lt>E<lt>> you specify a string to terminate +the quoted material, and all lines following the current line down to +the terminating string are the value of the item. The terminating +string may be either an identifier (a word), or some quoted text. If +quoted, the type of quotes you use determines the treatment of the +text, just as in regular quoting. An unquoted identifier works like +double quotes. There must be no space between the C<E<lt>E<lt>> and +the identifier. (If you put a space it will be treated as a null +identifier, which is valid, and matches the first empty line.) The +terminating string must appear by itself (unquoted and with no +surrounding whitespace) on the terminating line. + + print <<EOF; + The price is $Price. + EOF + + print <<"EOF"; # same as above + The price is $Price. + EOF + + print <<`EOC`; # execute commands + echo hi there + echo lo there + EOC + + print <<"foo", <<"bar"; # you can stack them + I said foo. + foo + I said bar. + bar + + myfunc(<<"THIS", 23, <<'THAT'); + Here's a line + or two. + THIS + and here's another. + THAT + +Just don't forget that you have to put a semicolon on the end +to finish the statement, as Perl doesn't know you're not going to +try to do this: + + print <<ABC + 179231 + ABC + + 20; + + +=head2 List value constructors + +List values are denoted by separating individual values by commas +(and enclosing the list in parentheses where precedence requires it): + + (LIST) + +In a context not requiring a list value, the value of the list +literal is the value of the final element, as with the C comma operator. +For example, + + @foo = ('cc', '-E', $bar); + +assigns the entire list value to array foo, but + + $foo = ('cc', '-E', $bar); + +assigns the value of variable bar to variable foo. Note that the value +of an actual array in a scalar context is the length of the array; the +following assigns the value 3 to $foo: + + @foo = ('cc', '-E', $bar); + $foo = @foo; # $foo gets 3 + +You may have an optional comma before the closing parenthesis of a +list literal, so that you can say: + + @foo = ( + 1, + 2, + 3, + ); + +LISTs do automatic interpolation of sublists. That is, when a LIST is +evaluated, each element of the list is evaluated in a list context, and +the resulting list value is interpolated into LIST just as if each +individual element were a member of LIST. Thus arrays and hashes lose their +identity in a LIST--the list + + (@foo,@bar,&SomeSub,%glarch) + +contains all the elements of @foo followed by all the elements of @bar, +followed by all the elements returned by the subroutine named SomeSub +called in a list context, followed by the key/value pairs of %glarch. +To make a list reference that does I<NOT> interpolate, see L<perlref>. + +The null list is represented by (). Interpolating it in a list +has no effect. Thus ((),(),()) is equivalent to (). Similarly, +interpolating an array with no elements is the same as if no +array had been interpolated at that point. + +A list value may also be subscripted like a normal array. You must +put the list in parentheses to avoid ambiguity. For example: + + # Stat returns list value. + $time = (stat($file))[8]; + + # SYNTAX ERROR HERE. + $time = stat($file)[8]; # OOPS, FORGOT PARENTHESES + + # Find a hex digit. + $hexdigit = ('a','b','c','d','e','f')[$digit-10]; + + # A "reverse comma operator". + return (pop(@foo),pop(@foo))[0]; + +You may assign to C<undef> in a list. This is useful for throwing +away some of the return values of a function: + + ($dev, $ino, undef, undef, $uid, $gid) = stat($file); + +Lists may be assigned to if and only if each element of the list +is legal to assign to: + + ($a, $b, $c) = (1, 2, 3); + + ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00); + +Array assignment in a scalar context returns the number of elements +produced by the expression on the right side of the assignment: + + $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2 + $x = (($foo,$bar) = f()); # set $x to f()'s return count + +This is very handy when you want to do a list assignment in a Boolean +context, because most list functions return a null list when finished, +which when assigned produces a 0, which is interpreted as FALSE. + +The final element may be an array or a hash: + + ($a, $b, @rest) = split; + my($a, $b, %rest) = @_; + +You can actually put an array or hash anywhere in the list, but the first one +in the list will soak up all the values, and anything after it will get +a null value. This may be useful in a local() or my(). + +A hash literal contains pairs of values to be interpreted +as a key and a value: + + # same as map assignment above + %map = ('red',0x00f,'blue',0x0f0,'green',0xf00); + +While literal lists and named arrays are usually interchangeable, that's +not the case for hashes. Just because you can subscript a list value like +a normal array does not mean that you can subscript a list value as a +hash. Likewise, hashes included as parts of other lists (including +parameters lists and return lists from functions) always flatten out into +key/value pairs. That's why it's good to use references sometimes. + +It is often more readable to use the C<=E<gt>> operator between key/value +pairs. The C<=E<gt>> operator is mostly just a more visually distinctive +synonym for a comma, but it also arranges for its left-hand operand to be +interpreted as a string--if it's a bareword that would be a legal identifier. +This makes it nice for initializing hashes: + + %map = ( + red => 0x00f, + blue => 0x0f0, + green => 0xf00, + ); + +or for initializing hash references to be used as records: + + $rec = { + witch => 'Mable the Merciless', + cat => 'Fluffy the Ferocious', + date => '10/31/1776', + }; + +or for using call-by-named-parameter to complicated functions: + + $field = $query->radio_group( + name => 'group_name', + values => ['eenie','meenie','minie'], + default => 'meenie', + linebreak => 'true', + labels => \%labels + ); + +Note that just because a hash is initialized in that order doesn't +mean that it comes out in that order. See L<perlfunc/sort> for examples +of how to arrange for an output ordering. + +=head2 Typeglobs and Filehandles + +Perl uses an internal type called a I<typeglob> to hold an entire +symbol table entry. The type prefix of a typeglob is a C<*>, because +it represents all types. This used to be the preferred way to +pass arrays and hashes by reference into a function, but now that +we have real references, this is seldom needed. + +The main use of typeglobs in modern Perl is create symbol table aliases. +This assignment: + + *this = *that; + +makes $this an alias for $that, @this an alias for @that, %this an alias +for %that, &this an alias for &that, etc. Much safer is to use a reference. +This: + + local *Here::blue = \$There::green; + +temporarily makes $Here::blue an alias for $There::green, but doesn't +make @Here::blue an alias for @There::green, or %Here::blue an alias for +%There::green, etc. See L<perlmod/"Symbol Tables"> for more examples +of this. Strange though this may seem, this is the basis for the whole +module import/export system. + +Another use for typeglobs is to to pass filehandles into a function or +to create new filehandles. If you need to use a typeglob to save away +a filehandle, do it this way: + + $fh = *STDOUT; + +or perhaps as a real reference, like this: + + $fh = \*STDOUT; + +See L<perlsub> for examples of using these as indirect filehandles +in functions. + +Typeglobs are also a way to create a local filehandle using the local() +operator. These last until their block is exited, but may be passed back. +For example: + + sub newopen { + my $path = shift; + local *FH; # not my! + open (FH, $path) or return undef; + return *FH; + } + $fh = newopen('/etc/passwd'); + +Now that we have the *foo{THING} notation, typeglobs aren't used as much +for filehandle manipulations, although they're still needed to pass brand +new file and directory handles into or out of functions. That's because +*HANDLE{IO} only works if HANDLE has already been used as a handle. +In other words, *FH can be used to create new symbol table entries, +but *foo{THING} cannot. + +Another way to create anonymous filehandles is with the IO::Handle +module and its ilk. These modules have the advantage of not hiding +different types of the same name during the local(). See the bottom of +L<perlfunc/open()> for an example. + +See L<perlref>, L<perlsub>, and L<perlmod/"Symbol Tables"> for more +discussion on typeglobs and the *foo{THING} syntax. |