aboutsummaryrefslogtreecommitdiff
path: root/doc/magic.man
diff options
context:
space:
mode:
Diffstat (limited to 'doc/magic.man')
-rw-r--r--doc/magic.man112
1 files changed, 73 insertions, 39 deletions
diff --git a/doc/magic.man b/doc/magic.man
index af4bfa89c6bd..6916b7b211d7 100644
--- a/doc/magic.man
+++ b/doc/magic.man
@@ -1,5 +1,5 @@
-.\" $File: magic.man,v 1.103 2023/07/20 14:32:07 christos Exp $
-.Dd Arpil 18, 2023
+.\" $File: magic.man,v 1.110 2024/11/27 15:37:00 christos Exp $
+.Dd November 27, 2024
.Dt MAGIC __FSECTION__
.Os
.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
@@ -50,6 +50,10 @@ is a regular file.
A continuation offset relative to the end of the last up-level field
.Dv ( \*[Am] ) .
.El
+If the offset starts with the symbol
+.Dq + ,
+then all offsets are interpreted as from the beginning of the file (the
+default).
.It Dv type
The type of the data to be tested.
The possible values are:
@@ -146,6 +150,10 @@ An eight-byte value interpreted as a UNIX-style date, but interpreted as
local time rather than UTC.
.It Dv qwdate
An eight-byte value interpreted as a Windows-style date.
+.It Dv msdosdate
+A two-byte value interpreted as FAT/DOS-style date.
+.It Dv msdostime
+A two-byte value interpreted as FAT/DOS-style time.
.It Dv beid3
A 32-bit ID3 length in big-endian byte order.
.It Dv beshort
@@ -175,6 +183,12 @@ than UTC.
.It Dv beqwdate
An eight-byte value in big-endian byte order,
interpreted as a Windows-style date.
+.It Dv bemsdosdate
+A two-byte value in big-endian byte order,
+interpreted as FAT/DOS-style date.
+.It Dv bemsdostime
+A two-byte value in big-endian byte order,
+interpreted as FAT/DOS-style time.
.It Dv bestring16
A two-byte unicode (UCS16) string in big-endian byte order.
.It Dv leid3
@@ -206,6 +220,12 @@ than UTC.
.It Dv leqwdate
An eight-byte value in little-endian byte order,
interpreted as a Windows-style date.
+.It Dv lemsdosdate
+A two-byte value in big-endian byte order,
+interpreted as FAT/DOS-style date.
+.It Dv lemsdostime
+A two-byte value in big-endian byte order,
+interpreted as FAT/DOS-style time.
.It Dv lestring16
A two-byte unicode (UCS16) string in little-endian byte order.
.It Dv melong
@@ -360,7 +380,6 @@ For example the magic entries:
.It Dv octal
A string representing an octal number.
.El
-.El
.Pp
For compatibility with the Single
.Ux
@@ -610,9 +629,9 @@ with level
For more complex files, one can use empty messages to get just the
"if/then" effect, in the following way:
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable
-\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40 MS-DOS executable
+\*[Gt]0x18 uleshort \*[Gt]0x3f extended PC executable (e.g., MS Windows)
.Ed
.Pp
Offsets do not need to be constant, but can also be read from the file
@@ -627,17 +646,17 @@ the file.
The value at that offset is read, and is used again as an offset
in the file.
Indirect offsets are of the form:
-.Em (( x [[.,][bBcCeEfFgGhHiIlmsSqQ]][+\-][ y ]) .
+.Em ( x [[.,][bBcCeEfFgGhHiIlmosSqQ]][+\-][ y ]) .
The value of
.Em x
is used as an offset in the file.
A byte, id3 length, short or long is read at that offset depending on the
-.Em [bBcCeEfFgGhHiIlmsSqQ]
+.Em [bBcCeEfFgGhHiIlLmsSqQ]
type specifier.
The value is treated as signed if
-.Dq ,
+.Dq \&,
is specified or unsigned if
-.Dq .
+.Dq \&.
is specified.
The capitalized types interpret the number as a big endian
value, whereas the small letter versions interpret the number as a little
@@ -652,13 +671,15 @@ The default type if one is not specified is long.
The following types are recognized:
.Bl -column -offset indent "Type" "Half/Short" "Little" "Size"
.It Sy Type Sy Mnemonic Sy Endian Sy Size
-.It bcBc Byte/Char N/A 1
+.It bcBC Byte/Char N/A 1
.It efg Double Little 8
.It EFG Double Big 8
.It hs Half/Short Little 2
.It HS Half/Short Big 2
.It i ID3 Little 4
.It I ID3 Big 4
+.It l Long Little 4
+.It L Long Big 4
.It m Middle Middle 4
.It o Octal Textual Variable
.It q Quad Little 8
@@ -668,12 +689,12 @@ The following types are recognized:
That way variable length structures can be examined:
.Bd -literal -offset indent
# MS Windows executables are also valid MS-DOS executables
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40 MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
-\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2)
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2)
.Ed
.Pp
This strategy of examining has a drawback: you must make sure that you
@@ -687,12 +708,12 @@ inside parentheses allows one to modify
the value read from the file before it is used as an offset:
.Bd -literal -offset indent
# MS Windows executables are also valid MS-DOS executables
-0 string MZ
+0 string MZ
# sometimes, the value at 0x18 is less that 0x40 but there's still an
# extended executable, simply appended to the file
-\*[Gt]0x18 leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
-\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
+\*[Gt]0x18 uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
+\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
.Ed
.Pp
Sometimes you do not know the exact offset as this depends on the length or
@@ -702,44 +723,45 @@ field using
.Sq \*[Am]
as a prefix to the offset:
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
# immediately following the PE signature is the CPU type
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386
-\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x8664 for x86-64
+\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha
.Ed
.Pp
Indirect and relative offsets can be combined:
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Lt]0x40
-\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Lt]0x40
+\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
# if it's not COFF, go back 512 bytes and add the offset taken
# from byte 2/3, which is yet another way of finding the start
# of the extended executable
-\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver)
+\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver)
.Ed
.Pp
Or the other way around:
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
# at offset 0x80 (-4, since relative offsets start at the end
# of the up-level match) inside the LE header, we find the absolute
# offset to the code area, where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed
+\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed
.Ed
.Pp
Or even both!
.Bd -literal -offset indent
-0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
-\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+0 string MZ
+\*[Gt]0x18 uleshort \*[Gt]0x3f
+\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
# at offset 0x58 inside the LE header, we find the relative offset
# to a data area where we look for a specific signature
-\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive
+\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive
.Ed
.Pp
If you have to deal with offset/length pairs in your file, even the
@@ -749,7 +771,7 @@ Note that this additional indirect offset is always relative to the
start of the main indirect offset.
.Bd -literal -offset indent
0 string MZ
-\*[Gt]0x18 leshort \*[Gt]0x3f
+\*[Gt]0x18 uleshort \*[Gt]0x3f
\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
# search for the PE section called ".idata"...
\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata
@@ -762,7 +784,7 @@ If you have a list of known values at a particular continuation level,
and you want to provide a switch-like default case:
.Bd -literal -offset indent
# clear that continuation level match
-\*[Gt]18 clear
+\*[Gt]18 clear x
\*[Gt]18 lelong 1 one
\*[Gt]18 lelong 2 two
\*[Gt]18 default x
@@ -828,3 +850,15 @@ to make it clearer that those types have specified widths.
.\" the changes I posted to the S5R2 version.
.\"
.\" Modified for Ian Darwin's version of the file command.
+.\"
+.\" For emacs editor
+.\" Local Variables:
+.\" eval: (add-hook 'before-save-hook 'time-stamp)
+.\" time-stamp-start: ".Dd "
+.\" time-stamp-end: "$"
+.\" time-stamp-format: "%:B %02d, %:Y"
+.\" time-stamp-time-zone: "UTC0"
+.\" system-time-locale: "C"
+.\" eval:(setq compile-command (concat "groff -Tlatin1 -m man " (buffer-file-name)) )
+.\" End:
+.\"