diff options
Diffstat (limited to 'contrib/file/magic.man')
| -rw-r--r-- | contrib/file/magic.man | 400 |
1 files changed, 0 insertions, 400 deletions
diff --git a/contrib/file/magic.man b/contrib/file/magic.man deleted file mode 100644 index 73b6a3dd7d95..000000000000 --- a/contrib/file/magic.man +++ /dev/null @@ -1,400 +0,0 @@ -.TH MAGIC __FSECTION__ "Public Domain" -.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems. -.SH NAME -magic \- file command's magic number file -.SH DESCRIPTION -This manual page documents the format of the magic file as -used by the -.BR file (__CSECTION__) -command, version __VERSION__. -The -.BR file -command identifies the type of a file using, -among other tests, -a test for whether the file begins with a certain -.IR "magic number" . -The file -.I __MAGIC__ -specifies what magic numbers are to be tested for, -what message to print if a particular magic number is found, -and additional information to extract from the file. -.PP -Each line of the file specifies a test to be performed. -A test compares the data starting at a particular offset -in the file with a 1-byte, 2-byte, or 4-byte numeric value or -a string. -If the test succeeds, a message is printed. -The line consists of the following fields: -.IP offset \w'message'u+2n -A number specifying the offset, in bytes, into the file of the data -which is to be tested. -.IP type -The type of the data to be tested. -The possible values are: -.RS -.IP byte \w'message'u+2n -A one-byte value. -.IP short -A two-byte value (on most systems) in this machine's native byte order. -.IP long -A four-byte value (on most systems) in this machine's native byte order. -.IP string -A string of bytes. -The string type specification can be optionally followed -by /[Bbc]*. -The ``B'' flag compacts whitespace in the target, which must -contain at least one whitespace character. -If the magic has -.I n -consecutive blanks, the target needs at least -.I n -consecutive blanks to match. -The ``b'' flag treats every blank in the target as an optional blank. -Finally the ``c'' flag, specifies case insensitive matching: lowercase -characters in the magic match both lower and upper case characters in the -targer, whereas upper case characters in the magic, only much uppercase -characters in the target. -.IP pstring -A pascal style string where the first byte is interpreted as the an -unsigned length. The string is not NUL terminated. -.IP date -A four-byte value interpreted as a UNIX date. -.IP ldate -A four-byte value interpreted as a UNIX-style date, but interpreted as -local time rather than UTC. -.IP beshort -A two-byte value (on most systems) in big-endian byte order. -.IP belong -A four-byte value (on most systems) in big-endian byte order. -.IP bedate -A four-byte value (on most systems) in big-endian byte order, -interpreted as a Unix date. -.IP beldate -A four-byte value (on most systems) in big-endian byte order, -interpreted as a UNIX-style date, but interpreted as local time rather -than UTC. -.IP bestring16 -A two-byte unicode (UCS16) string in big-endian byte order. -.IP leshort -A two-byte value (on most systems) in little-endian byte order. -.IP lelong -A four-byte value (on most systems) in little-endian byte order. -.IP ledate -A four-byte value (on most systems) in little-endian byte order, -interpreted as a UNIX date. -.IP leldate -A four-byte value (on most systems) in little-endian byte order, -interpreted as a UNIX-style date, but interpreted as local time rather -than UTC. -.IP lestring16 -A two-byte unicode (UCS16) string in little-endian byte order. -.IP melong -A four-byte value (on most systems) in middle-endian (PDP-11) byte order. -.IP medate -A four-byte value (on most systems) in middle-endian (PDP-11) byte order, -interpreted as a UNIX date. -.IP meldate -A four-byte value (on most systems) in middle-endian (PDP-11) byte order, -interpreted as a UNIX-style date, but interpreted as local time rather -than UTC. -.IP regex -A regular expression match in extended POSIX regular expression syntax -(much like egrep). -The type specification can be optionally followed by -.B /c -for case-insensitive matches. -The regular expression is always -tested against the first -.B N -lines, where -.B N -is the given offset, thus it -is only useful for (single-byte encoded) text. -.B ^ -and -.B $ -will match the beginning and end of individual lines, respectively, -not beginning and end of file. -.IP search -A literal string search starting at the given offset. It must be followed by -.B /<number> -which specifies how many matches shall be attempted (the range). -This is suitable for searching larger binary expressions with variable -offsets, using -.B \e -escapes for special characters. -.RE -.PP -The numeric types may optionally be followed by -.B & -and a numeric value, -to specify that the value is to be AND'ed with the -numeric value before any comparisons are done. -Prepending a -.B u -to the type indicates that ordered comparisons should be unsigned. -.IP test -The value to be compared with the value from the file. -If the type is -numeric, this value -is specified in C form; if it is a string, it is specified as a C string -with the usual escapes permitted (e.g. \en for new-line). -.IP -Numeric values -may be preceded by a character indicating the operation to be performed. -It may be -.BR = , -to specify that the value from the file must equal the specified value, -.BR < , -to specify that the value from the file must be less than the specified -value, -.BR > , -to specify that the value from the file must be greater than the specified -value, -.BR & , -to specify that the value from the file must have set all of the bits -that are set in the specified value, -.BR ^ , -to specify that the value from the file must have clear any of the bits -that are set in the specified value, or -.BR ~ , -the value specified after is negated before tested. -.BR x , -to specify that any value will match. -If the character is omitted, it is assumed to be -.BR = . -For all tests except -.B string -and -.B regex, -operation -.BR ! -specifies that the line matches if the test does -.B not -succeed. -.IP -Numeric values are specified in C form; e.g. -.B 13 -is decimal, -.B 013 -is octal, and -.B 0x13 -is hexadecimal. -.IP -For string values, the byte string from the -file must match the specified byte string. -The operators -.BR = , -.B < -and -.B > -(but not -.BR & ) -can be applied to strings. -The length used for matching is that of the string argument -in the magic file. -This means that a line can match any string, and -then presumably print that string, by doing -.B >\e0 -(because all strings are greater than the null string). -.IP message -The message to be printed if the comparison succeeds. If the string -contains a -.BR printf (3) -format specification, the value from the file (with any specified masking -performed) is printed using the message as the format string. -.PP -Some file formats contain additional information which is to be printed -along with the file type or need additional tests to determine the true -file type. -These additional tests are introduced by one or more -.B > -characters preceding the offset. -The number of -.B > -on the line indicates the level of the test; a line with no -.B > -at the beginning is considered to be at level 0. -Tests are arranged in a tree-like hierarchy: -If a the test on a line at level -.IB n -succeeds, all following tests at level -.IB n+1 -are performed, and the messages printed if the tests succeed, untile a line -with level -.IB n -(or less) appears. -For more complex files, one can use empty messages to get just the -"if/then" effect, in the following way: -.sp -.nf - 0 string MZ - >0x18 leshort <0x40 MS-DOS executable - >0x18 leshort >0x3f extended PC executable (e.g., MS Windows) -.fi -.PP -Offsets do not need to be constant, but can also be read from the file -being examined. -If the first character following the last -.B > -is a -.B ( -then the string after the parenthesis is interpreted as an indirect offset. -That means that the number after the parenthesis is used as an offset in -the file. -The value at that offset is read, and is used again as an offset -in the file. -Indirect offsets are of the form: -.BI (( x [.[bslBSL]][+\-][ y ]). -The value of -.I x -is used as an offset in the file. A byte, short or long is read at that offset -depending on the -.B [bslBSLm] -type specifier. -The capitalized types interpret the number as a big endian -value, whereas the small letter versions interpret the number as a little -endian value; -the -.B m -type interprets the number as a middle endian (PDP-11) value. -To that number the value of -.I y -is added and the result is used as an offset in the file. -The default type if one is not specified is long. -.PP -That way variable length structures can be examined: -.sp -.nf - # MS Windows executables are also valid MS-DOS executables - 0 string MZ - >0x18 leshort <0x40 MZ executable (MS-DOS) - # skip the whole block below if it is not an extended executable - >0x18 leshort >0x3f - >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) - >>(0x3c.l) string LX\e0\e0 LX executable (OS/2) -.fi -.PP -This strategy of examining has one drawback: You must make sure that -you eventually print something, or users may get empty output (like, when -there is neither PE\e0\e0 nor LE\e0\e0 in the above example) -.PP -If this indirect offset cannot be used as-is, there are simple calculations -possible: appending -.BI [+-*/%&|^]<number> -inside parentheses allows one to modify -the value read from the file before it is used as an offset: -.sp -.nf - # MS Windows executables are also valid MS-DOS executables - 0 string MZ - # sometimes, the value at 0x18 is less that 0x40 but there's still an - # extended executable, simply appended to the file - >0x18 leshort <0x40 - >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) - >>(4.s*512) leshort !0x014c MZ executable (MS-DOS) -.fi -.PP -Sometimes you do not know the exact offset as this depends on the length or -position (when indirection was used before) of preceding fields. You can -specify an offset relative to the end of the last uplevel field using -.BI & -as a prefix to the offset: -.sp -.nf - 0 string MZ - >0x18 leshort >0x3f - >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) - # immediately following the PE signature is the CPU type - >>>&0 leshort 0x14c for Intel 80386 - >>>&0 leshort 0x184 for DEC Alpha -.fi -.PP -Indirect and relative offsets can be combined: -.sp -.nf - 0 string MZ - >0x18 leshort <0x40 - >>(4.s*512) leshort !0x014c MZ executable (MS-DOS) - # if it's not COFF, go back 512 bytes and add the offset taken - # from byte 2/3, which is yet another way of finding the start - # of the extended executable - >>>&(2.s-514) string LE LE executable (MS Windows VxD driver) -.fi -.PP -Or the other way around: -.sp -.nf - 0 string MZ - >0x18 leshort >0x3f - >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) - # at offset 0x80 (-4, since relative offsets start at the end - # of the uplevel match) inside the LE header, we find the absolute - # offset to the code area, where we look for a specific signature - >>>(&0x7c.l+0x26) string UPX \eb, UPX compressed -.fi -.PP -Or even both! -.sp -.nf - 0 string MZ - >0x18 leshort >0x3f - >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) - # at offset 0x58 inside the LE header, we find the relative offset - # to a data area where we look for a specific signature - >>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive -.fi -.PP -Finally, if you have to deal with offset/length pairs in your file, even the -second value in a parenthesed expression can be taken from the file itself, -using another set of parentheses. Note that this additional indirect offset -is always relative to the start of the main indirect offset. -.sp -.nf - 0 string MZ - >0x18 leshort >0x3f - >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) - # search for the PE section called ".idata"... - >>>&0xf4 search/0x140 .idata - # ...and go to the end of it, calculated from start+length; - # these are located 14 and 10 bytes after the section name - >>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive -.fi -.SH BUGS -The formats -.IR long , -.IR belong , -.IR lelong , -.IR melong , -.IR short , -.IR beshort , -.IR leshort , -.IR date , -.IR bedate , -.IR medate , -.IR ledate , -.IR beldate , -.IR leldate , -and -.I meldate -are system-dependent; perhaps they should be specified as a number -of bytes (2B, 4B, etc), -since the files being recognized typically come from -a system on which the lengths are invariant. -.SH SEE ALSO -.BR file (__CSECTION__) -\- the command that reads this file. -.\" -.\" From: guy@sun.uucp (Guy Harris) -.\" Newsgroups: net.bugs.usg -.\" Subject: /etc/magic's format isn't well documented -.\" Message-ID: <2752@sun.uucp> -.\" Date: 3 Sep 85 08:19:07 GMT -.\" Organization: Sun Microsystems, Inc. -.\" Lines: 136 -.\" -.\" Here's a manual page for the format accepted by the "file" made by adding -.\" the changes I posted to the S5R2 version. -.\" -.\" Modified for Ian Darwin's version of the file command. -.\" @(#)$Id: magic.man,v 1.30 2006/02/19 18:16:03 christos Exp $ |
