aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWarren Block <wblock@FreeBSD.org>2014-01-26 02:30:34 +0000
committerWarren Block <wblock@FreeBSD.org>2014-01-26 02:30:34 +0000
commit31d08ba8b68f57e6289b544a4c28e01cc73be9a3 (patch)
tree08651d6e017ff305ce3f6fba8d276dc3580f7252
parent9d06fb107b9c5a204abaaf669641e2c4c7737b3c (diff)
Notes
-rw-r--r--en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml1884
1 files changed, 1614 insertions, 270 deletions
diff --git a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
index e58c126978..53ab4b8579 100644
--- a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
@@ -4,6 +4,8 @@ The FreeBSD Documentation Project
Copyright (c) 2002 Sergey Lyubka <devnull@uptsoft.com>
All rights reserved
+Copyright (c) 2014 Sergio Andr?s G?mez del Real <Sergio.G.delReal@gmail.com>
+All rights reserved
$FreeBSD$
-->
@@ -25,6 +27,18 @@ $FreeBSD$
</author>
<!-- devnull@uptsoft.com 12 Jun 2002 -->
</authorgroup>
+
+ <authorgroup>
+ <author>
+ <personname>
+ <firstname>Sergio Andr&eacute;s</firstname>
+ <surname> G&oacute;mez del Real</surname>
+ </personname>
+
+ <contrib>Updated and enhanced by </contrib>
+ </author>
+ <!-- Sergio.G.DelReal@gmail.com Jan 2014 -->
+ </authorgroup>
</info>
<sect1 xml:id="boot-synopsis">
@@ -37,88 +51,103 @@ $FreeBSD$
<indexterm><primary>booting</primary></indexterm>
<indexterm><primary>system initialization</primary></indexterm>
<para>This chapter is an overview of the boot and system
- initialization process, starting from the BIOS (firmware) POST,
- to the first user process creation. Since the initial steps of
- system startup are very architecture dependent, the IA-32
- architecture is used as an example.</para>
+ initialization processes, starting from the <acronym>BIOS</acronym> (firmware)
+ <acronym>POST</acronym>, to the first user process creation. Since the initial
+ steps of system startup are very architecture dependent, the
+ IA-32 architecture is used as an example.</para>
+
+ <para>The &os; boot process can be surprisingly complex. After
+ control is passed from the <acronym>BIOS</acronym>, a considerable amount of
+ low-level configuration must be done before the kernel can be
+ loaded and executed. This setup must be done in a simple and
+ flexible manner, allowing the user a great deal of customization
+ possibilities.</para>
</sect1>
<sect1 xml:id="boot-overview">
<title>Overview</title>
- <para>A computer running FreeBSD can boot by several methods,
- although the most common method, booting from a harddisk where
- the OS is installed, will be discussed here. The boot process
- is divided into several steps:</para>
-
- <itemizedlist>
- <listitem><para>BIOS POST</para></listitem>
- <listitem><para><literal>boot0</literal> stage</para></listitem>
- <listitem><para><literal>boot2</literal> stage</para></listitem>
- <listitem><para>loader stage</para></listitem>
- <listitem><para>kernel initialization</para></listitem>
- </itemizedlist>
-
- <indexterm><primary>BIOS POST</primary></indexterm>
- <indexterm><primary>boot0</primary></indexterm>
- <indexterm><primary>boot2</primary></indexterm>
- <indexterm><primary>loader</primary></indexterm>
- <para>The <literal>boot0</literal> and <literal>boot2</literal>
- stages are also referred to as <emphasis>bootstrap stages 1 and
- 2</emphasis> in &man.boot.8; as the first steps in FreeBSD's
- 3-stage bootstrapping procedure. Various information is printed
- on the screen at each stage, so you may visually recognize them
- using the table that follows. Please note that the actual data
+ <para>The boot process is an extremely machine-dependent
+ activity. Not only must code be written for every computer
+ architecture, but there may also be multiple types of booting on
+ the same architecture. For example, looking at
+ <filename class="directory">/usr/sys/src/boot</filename>
+ reveals a great amount of architecture-dependent code. There is
+ a directory for each of the various supported architectures. In
+ the x86-specific <filename class="directory">i386</filename>
+ directory, there are subdirectories for different boot standards
+ like <filename>mbr</filename> (Master Boot Record),
+ <filename>gpt</filename> (<acronym>GUID</acronym> Partition
+ Table), and <filename>efi</filename> (Extensible Firmware
+ Interface). Each boot standard has its own conventions and data
+ structures. The example that follows shows booting an x86
+ computer from an <acronym>MBR</acronym> hard drive with the &os;
+ <filename>boot0</filename> multi-boot loader stored in the very
+ first sector. That boot code starts the &os; three-stage boot
+ process.</para>
+
+ <para>The key to understanding this process is that it is a series
+ of stages of increasing complexity. These stages are
+ <filename>boot1</filename>, <filename>boot2</filename>, and
+ <filename>loader</filename> (see &man.boot.8; for more detail).
+ The boot system executes each stage in sequence. The last
+ stage, <filename>loader</filename>, is responsible for loading
+ the &os; kernel. Each stage is examined in the following
+ sections.</para>
+
+ <para>Here is an example of the output generated by the
+ different boot stages. Actual output
may differ from machine to machine:</para>
<informaltable frame="none" pgwide="0">
<tgroup cols="2">
<tbody>
<row>
- <entry><para>Output (may vary)</para></entry>
- <entry><para>BIOS (firmware) messages</para></entry>
+ <entry>&os; Component</entry>
+ <entry>Output (may vary)</entry>
</row>
<row>
- <entry><para><screen>F1 FreeBSD
+ <entry><literal>boot0</literal></entry>
+ <entry><screen>F1 FreeBSD
F2 BSD
-F5 Disk 2</screen></para></entry>
- <entry><para><literal>boot0</literal></para></entry>
+F5 Disk 2</screen></entry>
</row>
<row>
- <entry><para><screen>&gt;&gt;FreeBSD/i386 BOOT
-Default: 1:ad(1,a)/boot/loader
-boot:</screen></para></entry>
- <entry><para><literal>boot2</literal>
+ <entry><literal>boot2</literal>
<footnote><para>This prompt will appear if the user
presses a key just after selecting an OS to boot
at the <literal>boot0</literal>
- stage.</para></footnote></para></entry>
+ stage.</para></footnote></entry>
+ <entry><screen>&gt;&gt;FreeBSD/i386 BOOT
+Default: 1:ad(1,a)/boot/loader
+boot:</screen></entry>
</row>
<row>
- <entry><para><screen>BTX loader 1.0 BTX version is 1.01
-BIOS drive A: is disk0
-BIOS drive C: is disk1
-BIOS 639kB/64512kB available memory
-FreeBSD/i386 bootstrap loader, Revision 0.8
+ <entry><filename>loader</filename></entry>
+ <entry><screen>BTX loader 1.00 BTX version is 1.02
+Consoles: internal video/keyboard
+BIOS drive C: is disk0
+BIOS 639kB/2096064kB available memory
+
+FreeBSD/x86 bootstrap loader, Revision 1.1
Console internal video/keyboard
-(jkh@bento.freebsd.org, Mon Nov 20 11:41:23 GMT 2000)
-/kernel text=0x1234 data=0x2345 syms=[0x4+0x3456]
-Hit [Enter] to boot immediately, or any other key for command prompt
-Booting [kernel] in 9 seconds..._</screen></para></entry>
- <entry><para>loader</para></entry>
+(root@snap.freebsd.org, Thu Jan 16 22:18:05 UTC 2014)
+Loading /boot/defaults/loader.conf
+/boot/kernel/kernel text=0xed9008 data=0x117d28+0x176650 syms=[0x8+0x137988+0x8+0x1515f8]</screen></entry>
</row>
<row>
- <entry><para><screen>Copyright (c) 1992-2002 The FreeBSD Project.
+ <entry>kernel</entry>
+ <entry><screen>Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
-FreeBSD 4.6-RC #0: Sat May 4 22:49:02 GMT 2002
- devnull@kukas:/usr/obj/usr/src/sys/DEVNULL
-Timecounter "i8254" frequency 1193182 Hz</screen></para></entry>
- <entry><para>kernel</para></entry>
+FreeBSD is a registered trademark of The FreeBSD Foundation.
+FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014
+ root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
+FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610</screen></entry>
</row>
</tbody>
</tgroup>
@@ -126,84 +155,114 @@ Timecounter "i8254" frequency 1193182 Hz</screen></para></entry>
</sect1>
<sect1 xml:id="boot-bios">
- <title>BIOS POST</title>
+ <title>The <acronym>BIOS</acronym></title>
- <para>When the PC powers on, the processor's registers are set
- to some predefined values. One of the registers is the
+ <para>When the computer powers on, the processor's registers are
+ set to some predefined values. One of the registers is the
<emphasis>instruction pointer</emphasis> register, and its value
after a power on is well defined: it is a 32-bit value of
- 0xfffffff0. The instruction pointer register points to code to
- be executed by the processor. One of the registers is the
+ <literal>0xfffffff0</literal>. The instruction pointer register
+ (also known as the Program Counter) points to code to be
+ executed by the processor. Another important register is the
<literal>cr0</literal> 32-bit control register, and its value
- just after the reboot is 0. One of the cr0's bits, the bit PE
- (Protection Enabled) indicates whether the processor is running
- in protected or real mode. Since at boot time this bit is
- cleared, the processor boots in real mode. Real mode means,
+ just after a reboot is <literal>0</literal>. One of
+ <literal>cr0</literal>'s bits, the PE (Protection Enabled) bit,
+ indicates whether the processor is running in 32-bit protected
+ mode or 16-bit real mode. Since this bit is cleared at boot
+ time, the processor boots in 16-bit real mode. Real mode means,
among other things, that linear and physical addresses are
- identical.</para>
-
- <para>The value of 0xfffffff0 is slightly less then 4Gb, so unless
- the machine has 4Gb physical memory, it cannot point to a valid
- memory address. The computer's hardware translates this address
- so that it points to a BIOS memory block.</para>
-
- <para>BIOS stands for <emphasis>Basic Input Output
- System</emphasis>, and it is a chip on the motherboard that
- has a relatively small amount of read-only memory (ROM). This
+ identical. The reason for the processor not to start
+ immediately in 32-bit protected mode is backwards compatibility.
+ In particular, the boot process relies on the services provided
+ by the <acronym>BIOS</acronym>, and the <acronym>BIOS</acronym>
+ itself works in legacy, 16-bit code.</para>
+
+ <para>The value of <literal>0xfffffff0</literal> is slightly less
+ than 4&nbsp;GB, so unless the machine has 4&nbsp;GB of physical
+ memory, it cannot point to a valid memory address. The
+ computer's hardware translates this address so that it points to
+ a <acronym>BIOS</acronym> memory block.</para>
+
+ <para>The <acronym>BIOS</acronym> (Basic Input Output
+ System) is a chip on the motherboard that has a relatively small
+ amount of read-only memory (<acronym>ROM</acronym>). This
memory contains various low-level routines that are specific to
- the hardware supplied with the motherboard. So, the processor
- will first jump to the address 0xfffffff0, which really resides
- in the BIOS's memory. Usually this address contains a jump
- instruction to the BIOS's POST routines.</para>
-
- <para>POST stands for <emphasis>Power On Self Test</emphasis>.
- This is a set of routines including the memory check, system bus
- check and other low-level stuff so that the CPU can initialize
- the computer properly. The important step on this stage is
- determining the boot device. All modern BIOS's allow the boot
- device to be set manually, so you can boot from a floppy,
- CD-ROM, harddisk etc.</para>
-
- <para>The very last thing in the POST is the <literal>INT
- 0x19</literal> instruction. That instruction reads 512 bytes
- from the first sector of boot device into the memory at address
- 0x7c00. The term <emphasis>first sector</emphasis> originates
- from harddrive architecture, where the magnetic plate is divided
- to a number of cylindrical tracks. Tracks are numbered, and
- every track is divided by a number (usually 64) sectors. Track
- number 0 is the outermost on the magnetic plate, and sector 1,
- the first sector (tracks, or, cylinders, are numbered starting
- from 0, but sectors - starting from 1), has a special meaning.
- It is also called Master Boot Record, or MBR. The remaining
- sectors on the first track are never used <footnote><para>Some
- utilities such as &man.disklabel.8; may store the
- information in this area, mostly in the second
- sector.</para></footnote>.</para>
+ the hardware supplied with the motherboard. The processor will
+ first jump to the address 0xfffffff0, which really resides in
+ the <acronym>BIOS</acronym>'s memory. Usually this address
+ contains a jump instruction to the <acronym>BIOS</acronym>'s
+ POST routines.</para>
+
+ <para>The <acronym>POST</acronym> (Power On Self Test)
+ is a set of routines including the memory check, system bus
+ check, and other low-level initialization so the
+ <acronym>CPU</acronym> can set up the computer properly. The
+ important step of this stage is determining the boot device.
+ Modern <acronym>BIOS</acronym> implementations permit the
+ selection of a boot device, allowing booting from a floppy,
+ <acronym>CD-ROM</acronym>, hard disk, or other devices.</para>
+
+ <para>The very last thing in the <acronym>POST</acronym> is the
+ <literal>INT 0x19</literal> instruction. The
+ <literal>INT 0x19</literal> handler reads 512 bytes from the
+ first sector of boot device into the memory at address
+ <literal>0x7c00</literal>. The term
+ <emphasis>first sector</emphasis> originates from hard drive
+ architecture, where the magnetic plate is divided into a number
+ of cylindrical tracks. Tracks are numbered, and every track is
+ divided into a number (usually 64) of sectors. Track numbers
+ start at 0, but sector numbers start from 1. Track 0 is the
+ outermost on the magnetic plate, and sector 1, the first sector,
+ has a special purpose. It is also called the
+ <acronym>MBR</acronym>, or Master Boot Record. The remaining
+ sectors on the first track are never used.</para>
+
+ <para>This sector is our boot-sequence starting point. As we will
+ see, this sector contains a copy of our
+ <filename>boot0</filename> program. A jump is made by the
+ <acronym>BIOS</acronym> to address <literal>0x7c00</literal> so
+ it starts executing.</para>
</sect1>
<sect1 xml:id="boot-boot0">
- <title><literal>boot0</literal> Stage</title>
+ <title>The Master Boot Record (<literal>boot0</literal>)</title>
<indexterm><primary>MBR</primary></indexterm>
- <para>Take a look at the file <filename>/boot/boot0</filename>.
- This is a small 512-byte file, and it is exactly what FreeBSD's
- installation procedure wrote to your harddisk's MBR if you chose
- the <quote>bootmanager</quote> option at installation
- time.</para>
+
+ <para>After control is received from the <acronym>BIOS</acronym>
+ at memory address <literal>0x7c00</literal>,
+ <filename>boot0</filename> starts executing. It is the first
+ piece of code under &os; control. The task of
+ <filename>boot0</filename> is quite simple: scan the partition
+ table and let the user choose which partition to boot from. The
+ Partition Table is a special, standard data structure embedded
+ in the <acronym>MBR</acronym> (hence embedded in
+ <filename>boot0</filename>) describing the four standard PC
+ <quote>partitions</quote>
+ <footnote>
+ <para><link
+ xlink:href="http://en.wikipedia.org/wiki/Master_boot_record"></link></para></footnote>.
+ <filename>boot0</filename> resides in the filesystem as
+ <filename>/boot/boot0</filename>. It is a small 512-byte file,
+ and it is exactly what &os;'s installation procedure wrote to
+ the hard disk's <acronym>MBR</acronym> if you chose the <quote>bootmanager</quote>
+ option at installation time. Indeed,
+ <filename>boot0</filename> <emphasis>is</emphasis> the
+ <acronym>MBR</acronym>.</para>
<para>As mentioned previously, the <literal>INT 0x19</literal>
- instruction loads an MBR, i.e., the <filename>boot0</filename>
- content, into the memory at address 0x7c00. Taking a look at
- the file <filename>sys/boot/i386/boot0/boot0.S</filename> can
- give a guess at what is happening there - this is the boot
- manager, which is an awesome piece of code written by Robert
- Nordier.</para>
-
- <para>The MBR, or, <filename>boot0</filename>, has a special
- structure starting from offset 0x1be, called the
- <emphasis>partition table</emphasis>. It has 4 records of 16
- bytes each, called <emphasis>partition records</emphasis>, which
- represent how the harddisk(s) are partitioned, or, in FreeBSD's
+ instruction causes the <literal>INT 0x19</literal> handler to
+ load an <acronym>MBR</acronym> (<filename>boot0</filename>) into
+ memory at address <literal>0x7c00</literal>. The source file
+ for <filename>boot0</filename> can be found in
+ <filename>sys/boot/i386/boot0/boot0.S</filename> - which is an
+ awesome piece of code written by Robert Nordier.</para>
+
+ <para>A special structure starting from offset
+ <literal>0x1be</literal> in the <acronym>MBR</acronym> is called
+ the <emphasis>partition table</emphasis>. It has four records
+ of 16 bytes each, called <emphasis>partition records</emphasis>,
+ which represent how the hard disk is partitioned, or, in &os;'s
terminology, sliced. One byte of those 16 says whether a
partition (slice) is bootable or not. Exactly one record must
have that flag set, otherwise <filename>boot0</filename>'s code
@@ -229,186 +288,1471 @@ Timecounter "i8254" frequency 1193182 Hz</screen></para></entry>
</listitem>
</itemizedlist>
- <para>A partition record descriptor has the information about
+ <para>A partition record descriptor contains information about
where exactly the partition resides on the drive. Both
- descriptors, LBA and CHS, describe the same information, but in
- different ways: LBA (Logical Block Addressing) has the starting
- sector for the partition and the partition's length, while CHS
- (Cylinder Head Sector) has coordinates for the first and last
- sectors of the partition.</para>
-
- <para>The boot manager scans the partition table and prints the
- menu on the screen so the user can select what disk and what
- slice to boot. By pressing an appropriate key,
- <filename>boot0</filename> performs the following
- actions:</para>
+ descriptors, <acronym>LBA</acronym> and <acronym>CHS</acronym>,
+ describe the same information, but in different ways:
+ <acronym>LBA</acronym> (Logical Block Addressing) has the
+ starting sector for the partition and the partition's length,
+ while <acronym>CHS</acronym> (Cylinder Head Sector) has
+ coordinates for the first and last sectors of the partition.
+ The partition table ends with the special signature
+ <literal>0xaa55</literal>.</para>
+
+ <para>The <acronym>MBR</acronym> must fit into 512 bytes, a single
+ disk sector. This program uses low-level <quote>tricks</quote>
+ like taking advantage of the side effects of certain
+ instructions and reusing register values from previous
+ operations to make the most out of the fewest possible
+ instructions. Care must also be taken when handling the
+ partition table, which is embedded in the <acronym>MBR</acronym>
+ itself. For these reasons, be very careful when modifying
+ <filename>boot0.S</filename>.</para>
+
+ <para>Note that the <filename>boot0.S</filename> source file
+ is assembled <quote>as is</quote>: instructions are translated
+ one by one to binary, with no additional information (no
+ <acronym>ELF</acronym> file format, for example). This kind of
+ low-level control is achieved at link time through special
+ control flags passed to the linker. For example, the text
+ section of the program is set to be located at address
+ <literal>0x600</literal>. In practice this means that
+ <filename>boot0</filename> must be loaded to memory address
+ <literal>0x600</literal> in order to function properly.</para>
+
+ <para>It is worth looking at the <filename>Makefile</filename> for
+ <filename>boot0</filename>
+ (<filename>sys/boot/i386/boot0/Makefile</filename>), as it
+ defines some of the run-time behavior of
+ <filename>boot0</filename>. For instance, if a terminal
+ connected to the serial port (COM1) is used for I/O, the macro
+ <literal>SIO</literal> must be defined
+ (<literal>-DSIO</literal>). <literal>-DPXE</literal> enables
+ boot through <acronym>PXE</acronym> by pressing
+ <keycap>F6</keycap>. Additionally, the program defines a set of
+ <emphasis>flags</emphasis> that allow further modification of
+ its behavior. All of this is illustrated in the
+ <filename>Makefile</filename>. For example, look at the
+ linker directives which command the linker to start the text
+ section at address <literal>0x600</literal>, and to build the
+ output file <quote>as is</quote> (strip out any file
+ formatting):</para>
+
+ <figure xml:id="boot-boot0-makefile-as-is">
+ <title><filename>sys/boot/i386/boot0/Makefile</filename></title>
+
+ <programlisting> BOOT_BOOT0_ORG?=0x600
+ LDFLAGS=-e start -Ttext ${BOOT_BOOT0_ORG} \
+ -Wl,-N,-S,--oformat,binary</programlisting>
+ </figure>
+
+ <para>Let us now start our study of the <acronym>MBR</acronym>, or
+ <filename>boot0</filename>, starting where execution
+ begins.</para>
+
+ <note>
+ <para>Some modifications have been made to some instructions in
+ favor of better exposition. For example, some macros are
+ expanded, and some macro tests are omitted when the result of
+ the test is known. This applies to all of the code examples
+ shown.</para>
+ </note>
+
+ <figure xml:id="boot-boot0-entrypoint">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting>start:
+ cld # String ops inc
+ xorw %ax,%ax # Zero
+ movw %ax,%es # Address
+ movw %ax,%ds # data
+ movw %ax,%ss # Set up
+ movw 0x7c00,%sp # stack</programlisting>
+ </figure>
+
+ <para>This first block of code is the entry point of the program.
+ It is where the <acronym>BIOS</acronym> transfers control.
+ First, it makes sure that the string operations autoincrement
+ its pointer operands (the <literal>cld</literal> instruction)
+ <footnote>
+ <para>When in doubt, we refer the reader to the official Intel
+ manuals, which describe the exact semantics for each
+ instruction: <link
+ xlink:href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"></link>.</para></footnote>.
+ Then, as it makes no assumption about the state of the segment
+ registers, it initializes them. Finally, it sets the stack
+ pointer register (<literal>%sp</literal>) to address
+ <literal>0x7c00</literal>, so we have a working stack.</para>
+
+ <para>The next block is responsible for the relocation and
+ subsequent jump to the relocated code.</para>
+
+ <figure xml:id="boot-boot0-relocation">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting> movw $0x7c00,%si # Source
+ movw $0x600,%di # Destination
+ movw $512,%cx # Word count
+ rep # Relocate
+ movsb # code
+ movw %di,%bp # Address variables
+ movb $16,%cl # Words to clear
+ rep # Zero
+ stosb # them
+ incb -0xe(%di) # Set the S field to 1
+ jmp main-0x7c00+0x600 # Jump to relocated code</programlisting>
+ </figure>
+
+ <para>Because <filename>boot0</filename> is loaded by the
+ <acronym>BIOS</acronym> to address <literal>0x7C00</literal>, it
+ copies itself to address <literal>0x600</literal> and then
+ transfers control there (recall that it was linked to execute at
+ address <literal>0x600</literal>). The source address,
+ <literal>0x7c00</literal>, is copied to register
+ <literal>%si</literal>. The destination address,
+ <literal>0x600</literal>, to register <literal>%di</literal>.
+ The number of bytes to copy, <literal>512</literal> (the
+ program's size), is copied to register <literal>%cx</literal>.
+ Next, the <literal>rep</literal> instruction repeats the
+ instruction that follows, that is, <literal>movsb</literal>, the
+ number of times dictated by the <literal>%cx</literal> register.
+ The <literal>movsb</literal> instruction copies the byte pointed
+ to by <literal>%si</literal> to the address pointed to by
+ <literal>%di</literal>. This is repeated another 511 times. On
+ each repetition, both the source and destination registers,
+ <literal>%si</literal> and <literal>%di</literal>, are
+ incremented by one. Thus, upon completion of the 512-byte copy,
+ <literal>%di</literal> has the value
+ <literal>0x600</literal>+<literal>512</literal>=
+ <literal>0x800</literal>, and <literal>%si</literal> has the
+ value <literal>0x7c00</literal>+<literal>512</literal>=
+ <literal>0x7e00</literal>; we have thus completed the code
+ <emphasis>relocation</emphasis>.</para>
+
+ <para>Next, the destination register
+ <literal>%di</literal> is copied to <literal>%bp</literal>.
+ <literal>%bp</literal> gets the value <literal>0x800</literal>.
+ The value <literal>16</literal> is copied to
+ <literal>%cl</literal> in preparation for a new string operation
+ (like our previous <literal>movsb</literal>). Now,
+ <literal>stosb</literal> is executed 16 times. This instruction
+ copies a <literal>0</literal> value to the address pointed to by
+ the destination register (<literal>%di</literal>, which is
+ <literal>0x800</literal>), and increments it. This is repeated
+ another 15 times, so <literal>%di</literal> ends up with value
+ <literal>0x810</literal>. Effectively, this clears the address
+ range <literal>0x800</literal>-<literal>0x80f</literal>. This
+ range is used as a (fake) partition table for writing the
+ <acronym>MBR</acronym> back to disk. Finally, the sector field
+ for the <acronym>CHS</acronym> addressing of this fake partition
+ is given the value 1 and a jump is made to the main function
+ from the relocated code. Note that until this jump to the
+ relocated code, any reference to an absolute address was
+ avoided.</para>
+
+ <para>The following code block tests whether the drive number
+ provided by the <acronym>BIOS</acronym> should be used, or
+ the one stored in <filename>boot0</filename>.</para>
+
+ <figure xml:id="boot-boot0-drivenumber">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting>main:
+ testb $SETDRV,-69(%bp) # Set drive number?
+ jnz disable_update # Yes
+ testb %dl,%dl # Drive number valid?
+ js save_curdrive # Possibly (0x80 set)</programlisting>
+ </figure>
+
+ <para>This code tests the <literal>SETDRV</literal> bit
+ (<literal>0x20</literal>) in the <emphasis>flags</emphasis>
+ variable. Recall that register <literal>%bp</literal> points to
+ address location <literal>0x800</literal>, so the test is done
+ to the <emphasis>flags</emphasis> variable at address
+ <literal>0x800</literal>-<literal>69</literal>=
+ <literal>0x7bb</literal>. This is an example of the type of
+ modifications that can be done to <filename>boot0</filename>.
+ The <literal>SETDRV</literal> flag is not set by default, but it
+ can be set in the <filename>Makefile</filename>. When set, the
+ drive number stored in the <acronym>MBR</acronym> is used
+ instead of the one provided by the <acronym>BIOS</acronym>. We
+ assume the defaults, and that the <acronym>BIOS</acronym>
+ provided a valid drive number, so we jump to
+ <literal>save_curdrive</literal>.</para>
+
+ <para>The next block saves the drive number provided by the
+ <acronym>BIOS</acronym>, and calls <literal>putn</literal> to
+ print a new line on the screen.</para>
+
+ <figure xml:id="boot-boot0-savedrivenumber">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting>save_curdrive:
+ movb %dl, (%bp) # Save drive number
+ pushw %dx # Also in the stack
+#ifdef TEST /* test code, print internal bios drive */
+ rolb $1, %dl
+ movw $drive, %si
+ call putkey
+#endif
+ callw putn # Print a newline</programlisting>
+ </figure>
+
+ <para>Note that we assume <varname>TEST</varname> is not defined,
+ so the conditional code in it is not assembled and will not
+ appear in our executable <filename>boot0</filename>.</para>
+
+ <para>Our next block implements the actual scanning of the
+ partition table. It prints to the screen the partition type for
+ each of the four entries in the partition table. It compares
+ each type with a list of well-known operating system file
+ systems. Examples of recognized partition types are
+ <acronym>NTFS</acronym> (&windows;, ID 0x7),
+ <literal>ext2fs</literal> (&linux;, ID 0x83), and, of course,
+ <literal>ffs</literal>/<literal>ufs2</literal> (&os;, ID 0xa5).
+ The implementation is fairly simple.</para>
+
+ <figure xml:id="boot-boot0-partition-scan">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting> movw $(partbl+0x4),%bx # Partition table (+4)
+ xorw %dx,%dx # Item number
+
+read_entry:
+ movb %ch,-0x4(%bx) # Zero active flag (ch == 0)
+ btw %dx,_FLAGS(%bp) # Entry enabled?
+ jnc next_entry # No
+ movb (%bx),%al # Load type
+ test %al, %al # skip empty partition
+ jz next_entry
+ movw $bootable_ids,%di # Lookup tables
+ movb $(TLEN+1),%cl # Number of entries
+ repne # Locate
+ scasb # type
+ addw $(TLEN-1), %di # Adjust
+ movb (%di),%cl # Partition
+ addw %cx,%di # description
+ callw putx # Display it
+
+next_entry:
+ incw %dx # Next item
+ addb $0x10,%bl # Next entry
+ jnc read_entry # Till done</programlisting>
+ </figure>
+
+ <para>It is important to note that the active flag for each entry
+ is cleared, so after the scanning, <emphasis>no</emphasis>
+ partition entry is active in our memory copy of
+ <filename>boot0</filename>. Later, the active flag will be set
+ for the selected partition. This ensures that only one active
+ partition exists if the user chooses to write the changes back
+ to disk.</para>
+
+ <para>The next block tests for other drives. At startup,
+ the <acronym>BIOS</acronym> writes the number of drives present
+ in the computer to address <literal>0x475</literal>. If there
+ are any other drives present, <filename>boot0</filename> prints
+ the current drive to screen. The user may command
+ <filename>boot0</filename> to scan partitions on another drive
+ later.</para>
+
+ <figure xml:id="boot-boot0-test-drives">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting> popw %ax # Drive number
+ subb $0x79,%al # Does next
+ cmpb 0x475,%al # drive exist? (from BIOS?)
+ jb print_drive # Yes
+ decw %ax # Already drive 0?
+ jz print_prompt # Yes</programlisting>
+ </figure>
+
+ <para>We make the assumption that a single drive is present, so
+ the jump to <literal>print_drive</literal> is not performed. We
+ also assume nothing strange happened, so we jump to
+ <literal>print_prompt</literal>.</para>
+
+ <para>This next block just prints out a prompt followed by the
+ default option:</para>
+
+ <figure xml:id="boot-boot0-prompt">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting>print_prompt:
+ movw $prompt,%si # Display
+ callw putstr # prompt
+ movb _OPT(%bp),%dl # Display
+ decw %si # default
+ callw putkey # key
+ jmp start_input # Skip beep</programlisting>
+ </figure>
+
+ <para>Finally, a jump is performed to
+ <literal>start_input</literal>, where the
+ <acronym>BIOS</acronym> services are used to start a timer and
+ for reading user input from the keyboard; if the timer expires,
+ the default option will be selected:</para>
+
+ <figure xml:id="boot-boot0-start-input">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting>start_input:
+ xorb %ah,%ah # BIOS: Get
+ int $0x1a # system time
+ movw %dx,%di # Ticks when
+ addw _TICKS(%bp),%di # timeout
+read_key:
+ movb $0x1,%ah # BIOS: Check
+ int $0x16 # for keypress
+ jnz got_key # Have input
+ xorb %ah,%ah # BIOS: int 0x1a, 00
+ int $0x1a # get system time
+ cmpw %di,%dx # Timeout?
+ jb read_key # No</programlisting>
+ </figure>
+
+ <para>An interrupt is requested with number
+ <literal>0x1a</literal> and argument <literal>0</literal> in
+ register <literal>%ah</literal>. The <acronym>BIOS</acronym>
+ has a predefined set of services, requested by applications as
+ software-generated interrupts through the <literal>int</literal>
+ instruction and receiving arguments in registers (in this case,
+ <literal>%ah</literal>). Here, particularly, we are requesting
+ the number of clock ticks since last midnight; this value is
+ computed by the <acronym>BIOS</acronym> through the
+ <acronym>RTC</acronym> (Real Time Clock). This clock can be
+ programmed to work at frequencies ranging from 2&nbsp;Hz to
+ 8192&nbsp;Hz. The <acronym>BIOS</acronym> sets it to
+ 18.2&nbsp;Hz at startup. When the request is satisfied, a
+ 32-bit result is returned by the <acronym>BIOS</acronym> in
+ registers <literal>%cx</literal> and <literal>%dx</literal>
+ (lower bytes in <literal>%dx</literal>). This result (the
+ <literal>%dx</literal> part) is copied to register
+ <literal>%di</literal>, and the value of the
+ <varname>TICKS</varname> variable is added to
+ <literal>%di</literal>. This variable resides in
+ <filename>boot0</filename> at offset <literal>_TICKS</literal>
+ (a negative value) from register <literal>%bp</literal> (which,
+ recall, points to <literal>0x800</literal>). The default value
+ of this variable is <literal>0xb6</literal> (182 in decimal).
+ Now, the idea is that <filename>boot0</filename> constantly
+ requests the time from the <acronym>BIOS</acronym>, and when the
+ value returned in register <literal>%dx</literal> is greater
+ than the value stored in <literal>%di</literal>, the time is up
+ and the default selection will be made. Since the RTC ticks
+ 18.2 times per second, this condition will be met after 10
+ seconds (this default behaviour can be changed in the
+ <filename>Makefile</filename>). Until this time has passed,
+ <filename>boot0</filename> continually asks the
+ <acronym>BIOS</acronym> for any user input; this is done through
+ <literal>int 0x16</literal>, argument <literal>1</literal> in
+ <literal>%ah</literal>.</para>
+
+ <para>Whether a key was pressed or the time expired, subsequent
+ code validates the selection. Based on the selection, the
+ register <literal>%si</literal> is set to point to the
+ appropriate partition entry in the partition table. This new
+ selection overrides the previous default one. Indeed, it
+ becomes the new default. Finally, the ACTIVE flag of the
+ selected partition is set. If it was enabled at compile time,
+ the in-memory version of <filename>boot0</filename> with these
+ modified values is written back to the <acronym>MBR</acronym> on
+ disk. We leave the details of this implementation to the
+ reader.</para>
+
+ <para>We now end our study with the last code block from the
+ <filename>boot0</filename> program:</para>
+
+ <figure xml:id="boot-boot0-check-bootable">
+ <title><filename>sys/boot/i386/boot0/boot0.S</filename></title>
+
+ <programlisting> movw $0x7c00,%bx # Address for read
+ movb $0x2,%ah # Read sector
+ callw intx13 # from disk
+ jc beep # If error
+ cmpw $0xaa55,0x1fe(%bx) # Bootable?
+ jne beep # No
+ pushw %si # Save ptr to selected part.
+ callw putn # Leave some space
+ popw %si # Restore, next stage uses it
+ jmp *%bx # Invoke bootstrap</programlisting>
+ </figure>
+
+ <para>Recall that <literal>%si</literal> points to the selected
+ partition entry. This entry tells us where the partition begins
+ on disk. We assume, of course, that the partition selected is
+ actually a &os; slice.</para>
+
+ <note>
+ <para>From now on, we will favor the use of the technically
+ more accurate term <quote>slice</quote> rather than
+ <quote>partition</quote>.</para>
+ </note>
+
+ <para>The transfer buffer is set to <literal>0x7c00</literal>
+ (register <literal>%bx</literal>), and a read for the first
+ sector of the &os; slice is requested by calling
+ <literal>intx13</literal>. We assume that everything went okay,
+ so a jump to <literal>beep</literal> is not performed. In
+ particular, the new sector read must end with the magic sequence
+ <literal>0xaa55</literal>. Finally, the value at
+ <literal>%si</literal> (the pointer to the selected partition
+ table) is preserved for use by the next stage, and a jump is
+ performed to address <literal>0x7c00</literal>, where execution
+ of our next stage (the just-read block) is started.</para>
+ </sect1>
+
+ <sect1 xml:id="boot-boot1">
+ <title><literal>boot1</literal> Stage</title>
+
+ <para>So far we have gone through the following sequence:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>The <acronym>BIOS</acronym> did some early hardware
+ initialization, including the <acronym>POST</acronym>. The
+ <acronym>MBR</acronym> (<filename>boot0</filename>) was
+ loaded from absolute disk sector one to address
+ <literal>0x7c00</literal>. Execution control was passed to
+ that location.</para>
+ </listitem>
+
+ <listitem>
+ <para><filename>boot0</filename> relocated itself to the
+ location it was linked to execute
+ (<literal>0x600</literal>), followed by a jump to continue
+ execution at the appropriate place. Finally,
+ <filename>boot0</filename> loaded the first disk sector from
+ the &os; slice to address <literal>0x7c00</literal>.
+ Execution control was passed to that location.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para><filename>boot1</filename> is the next step in the
+ boot-loading sequence. It is the first of three boot stages.
+ Note that we have been dealing exclusively
+ with disk sectors. Indeed, the <acronym>BIOS</acronym> loads
+ the absolute first sector, while <filename>boot0</filename>
+ loads the first sector of the &os; slice. Both loads are to
+ address <literal>0x7c00</literal>. We can conceptually think of
+ these disk sectors as containing the files
+ <filename>boot0</filename> and <filename>boot1</filename>,
+ respectively, but in reality this is not entirely true for
+ <filename>boot1</filename>. Strictly speaking, unlike
+ <filename>boot0</filename>, <filename>boot1</filename> is not
+ part of the boot blocks
+ <footnote>
+ <para>There is a file <filename>/boot/boot1</filename>, but it
+ is not the written to the beginning of the &os; slice.
+ Instead, it is concatenated with <filename>boot2</filename>
+ to form <filename>boot</filename>, which
+ <emphasis>is</emphasis> written to the beginning of the &os;
+ slice and read at boot time.</para></footnote>.
+ Instead, a single, full-blown file, <filename>boot</filename>
+ (<filename>/boot/boot</filename>), is what ultimately is
+ written to disk. This file is a combination of
+ <filename>boot1</filename>, <filename>boot2</filename> and the
+ <literal>Boot Extender</literal> (or <acronym>BTX</acronym>).
+ This single file is greater in size than a single sector
+ (greater than 512 bytes). Fortunately,
+ <filename>boot1</filename> occupies <emphasis>exactly</emphasis>
+ the first 512 bytes of this single file, so when
+ <filename>boot0</filename> loads the first sector of the &os;
+ slice (512 bytes), it is actually loading
+ <filename>boot1</filename> and transferring control to
+ it.</para>
+
+ <para>The main task of <filename>boot1</filename> is to load the
+ next boot stage. This next stage is somewhat more complex. It
+ is composed of a server called the <quote>Boot Extender</quote>,
+ or <acronym>BTX</acronym>, and a client, called
+ <filename>boot2</filename>. As we will see, the last boot
+ stage, <filename>loader</filename>, is also a client of the
+ <acronym>BTX</acronym> server.</para>
+
+ <para>Let us now look in detail at what exactly is done by
+ <filename>boot1</filename>, starting like we did for
+ <filename>boot0</filename>, at its entry point:</para>
+
+ <figure xml:id="boot-boot1-entry">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>start:
+ jmp main</programlisting>
+ </figure>
+
+ <para>The entry point at <literal>start</literal> simply jumps
+ past a special data area to the label <literal>main</literal>,
+ which in turn looks like this:</para>
+
+ <figure xml:id="boot-boot1-main">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>main:
+ cld # String ops inc
+ xor %cx,%cx # Zero
+ mov %cx,%es # Address
+ mov %cx,%ds # data
+ mov %cx,%ss # Set up
+ mov $start,%sp # stack
+ mov %sp,%si # Source
+ mov $0x700,%di # Destination
+ incb %ch # Word count
+ rep # Copy
+ movsw # code</programlisting>
+ </figure>
+
+ <para>Just like <filename>boot0</filename>, this
+ code relocates <filename>boot1</filename>,
+ this time to memory address <literal>0x700</literal>. However,
+ unlike <filename>boot0</filename>, it does not jump there.
+ <filename>boot1</filename> is linked to execute at
+ address <literal>0x7c00</literal>, effectively where it was
+ loaded in the first place. The reason for this relocation will
+ be discussed shortly.</para>
+
+ <para>Next comes a loop that looks for the &os; slice. Although
+ <filename>boot0</filename> loaded <filename>boot1</filename>
+ from the &os; slice, no information was passed to it about this
+ <footnote>
+ <para>Actually we did pass a pointer to the slice entry in
+ register <literal>%si</literal>. However,
+ <filename>boot1</filename> does not assume that it was
+ loaded by <filename>boot0</filename> (perhaps some other
+ <acronym>MBR</acronym> loaded it, and did not pass this
+ information), so it assumes nothing.</para></footnote>,
+ so <filename>boot1</filename> must rescan the
+ partition table to find where the &os; slice starts. Therefore
+ it rereads the <acronym>MBR</acronym>:</para>
+
+ <figure xml:id="boot-boot1-find-freebsd">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting> mov $part4,%si # Partition
+ cmpb $0x80,%dl # Hard drive?
+ jb main.4 # No
+ movb $0x1,%dh # Block count
+ callw nread # Read MBR</programlisting>
+ </figure>
+
+ <para>In the code above, register <literal>%dl</literal>
+ maintains information about the boot device. This is passed on
+ by the <acronym>BIOS</acronym> and preserved by the
+ <acronym>MBR</acronym>. Numbers <literal>0x80</literal> and
+ greater tells us that we are dealing with a hard drive, so a
+ call is made to <literal>nread</literal>, where the
+ <acronym>MBR</acronym> is read. Arguments to
+ <literal>nread</literal> are passed through
+ <literal>%si</literal> and <literal>%dh</literal>. The memory
+ address at label <literal>part4</literal> is copied to
+ <literal>%si</literal>. This memory address holds a
+ <quote>fake partition</quote> to be used by
+ <literal>nread</literal>. The following is the data in the fake
+ partition:</para>
+
+ <figure xml:id="boot-boot2-make-fake-partition">
+ <title><filename>sys/boot/i386/boot2/Makefile</filename></title>
+
+ <programlisting> part4:
+ .byte 0x80, 0x00, 0x01, 0x00
+ .byte 0xa5, 0xfe, 0xff, 0xff
+ .byte 0x00, 0x00, 0x00, 0x00
+ .byte 0x50, 0xc3, 0x00, 0x00</programlisting>
+ </figure>
+
+ <para>In particular, the <acronym>LBA</acronym> for this fake
+ partition is hardcoded to zero. This is used as an argument to
+ the <acronym>BIOS</acronym> for reading absolute sector one from
+ the hard drive. Alternatively, CHS addressing could be used.
+ In this case, the fake partition holds cylinder 0, head 0 and
+ sector 1, which is equivalent to absolute sector one.</para>
+
+ <para>Let us now proceed to take a look at
+ <literal>nread</literal>:</para>
+
+ <figure xml:id="boot-boot1-nread">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>nread:
+ mov $0x8c00,%bx # Transfer buffer
+ mov 0x8(%si),%ax # Get
+ mov 0xa(%si),%cx # LBA
+ push %cs # Read from
+ callw xread.1 # disk
+ jnc return # If success, return</programlisting>
+ </figure>
+
+ <para>Recall that <literal>%si</literal> points to the fake
+ partition. The word
+ <footnote>
+ <para>In the context of 16-bit real mode, a word is 2
+ bytes.</para></footnote>
+ at offset <literal>0x8</literal> is copied to register
+ <literal>%ax</literal> and word at offset <literal>0xa</literal>
+ to <literal>%cx</literal>. They are interpreted by the
+ <acronym>BIOS</acronym> as the lower 4-byte value denoting the
+ LBA to be read (the upper four bytes are assumed to be zero).
+ Register <literal>%bx</literal> holds the memory address where
+ the <acronym>MBR</acronym> will be loaded. The instruction
+ pushing <literal>%cs</literal> onto the stack is very
+ interesting. In this context, it accomplishes nothing. However, as
+ we will see shortly, <filename>boot2</filename>, in conjunction
+ with the <acronym>BTX</acronym> server, also uses
+ <literal>xread.1</literal>. This mechanism will be discussed in
+ the next section.</para>
+
+ <para>The code at <literal>xread.1</literal> further calls
+ the <literal>read</literal> function, which actually calls the
+ <acronym>BIOS</acronym> asking for the disk sector:</para>
+
+ <figure xml:id="boot-boot1-xread1">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>xread.1:
+ pushl $0x0 # absolute
+ push %cx # block
+ push %ax # number
+ push %es # Address of
+ push %bx # transfer buffer
+ xor %ax,%ax # Number of
+ movb %dh,%al # blocks to
+ push %ax # transfer
+ push $0x10 # Size of packet
+ mov %sp,%bp # Packet pointer
+ callw read # Read from disk
+ lea 0x10(%bp),%sp # Clear stack
+ lret # To far caller</programlisting>
+ </figure>
+
+ <para>Note the long return instruction at the end of this block.
+ This instruction pops out the <literal>%cs</literal> register
+ pushed by <literal>nread</literal>, and returns. Finally,
+ <literal>nread</literal> also returns.</para>
+
+ <para>With the <acronym>MBR</acronym> loaded to memory, the actual
+ loop for searching the &os; slice begins:</para>
+
+ <figure xml:id="boot-boot1-find-part">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting> mov $0x1,%cx # Two passes
+main.1:
+ mov $0x8dbe,%si # Partition table
+ movb $0x1,%dh # Partition
+main.2:
+ cmpb $0xa5,0x4(%si) # Our partition type?
+ jne main.3 # No
+ jcxz main.5 # If second pass
+ testb $0x80,(%si) # Active?
+ jnz main.5 # Yes
+main.3:
+ add $0x10,%si # Next entry
+ incb %dh # Partition
+ cmpb $0x5,%dh # In table?
+ jb main.2 # Yes
+ dec %cx # Do two
+ jcxz main.1 # passes</programlisting>
+ </figure>
+
+ <para>If a &os; slice is identified, execution continues at
+ <literal>main.5</literal>. Note that when a &os; slice is found
+ <literal>%si</literal> points to the appropriate entry in the
+ partition table, and <literal>%dh</literal> holds the partition
+ number. We assume that a &os; slice is found, so we continue
+ execution at <literal>main.5</literal>:</para>
+
+ <figure xml:id="boot-boot1-main5">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>main.5:
+ mov %dx,0x900 # Save args
+ movb $0x10,%dh # Sector count
+ callw nread # Read disk
+ mov $0x9000,%bx # BTX
+ mov 0xa(%bx),%si # Get BTX length and set
+ add %bx,%si # %si to start of boot2.bin
+ mov $0xc000,%di # Client page 2
+ mov $0xa200,%cx # Byte
+ sub %si,%cx # count
+ rep # Relocate
+ movsb # client</programlisting>
+ </figure>
+
+ <para>Recall that at this point, register <literal>%si</literal>
+ points to the &os; slice entry in the <acronym>MBR</acronym>
+ partition table, so a call to <literal>nread</literal> will
+ effectively read sectors at the beginning of this partition.
+ The argument passed on register <literal>%dh</literal> tells
+ <literal>nread</literal> to read 16 disk sectors. Recall that
+ the first 512 bytes, or the first sector of the &os; slice,
+ coincides with the <filename>boot1</filename> program. Also
+ recall that the file written to the beginning of the &os;
+ slice is not <filename>/boot/boot1</filename>, but
+ <filename>/boot/boot</filename>. Let us look at the size of
+ these files in the filesystem:</para>
+
+ <screen xml:id="boot-boot1-filesize">-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot0
+-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot1
+-r--r--r-- 1 root wheel 7.5K Jan 8 00:15 /boot/boot2
+-r--r--r-- 1 root wheel 8.0K Jan 8 00:15 /boot/boot</screen>
+
+ <para>Both <filename>boot0</filename> and
+ <filename>boot1</filename> are 512 bytes each, so they fit
+ <emphasis>exactly</emphasis> in one disk sector.
+ <filename>boot2</filename> is much bigger, holding both
+ the <acronym>BTX</acronym> server and the <filename>boot2</filename> client.
+ Finally, a file called simply <filename>boot</filename> is 512
+ bytes larger than <filename>boot2</filename>. This file is a
+ concatenation of <filename>boot1</filename> and
+ <filename>boot2</filename>. As already noted,
+ <filename>boot0</filename> is the file written to the absolute
+ first disk sector (the <acronym>MBR</acronym>), and
+ <filename>boot</filename> is the file written to the first
+ sector of the &os; slice; <filename>boot1</filename> and
+ <filename>boot2</filename> are <emphasis>not</emphasis> written
+ to disk. The command used to concatenate
+ <filename>boot1</filename> and <filename>boot2</filename> into a
+ single <filename>boot</filename> is merely
+ <command>cat boot1 boot2 &gt; boot</command>.</para>
+
+ <para>So <filename>boot1</filename> occupies exactly the first 512
+ bytes of <filename>boot</filename> and, because
+ <filename>boot</filename> is written to the first sector of the
+ &os; slice, <filename>boot1</filename> fits exactly in this
+ first sector. Because <literal>nread</literal> reads the first
+ 16 sectors of the &os; slice, it effectively reads the entire
+ <filename>boot</filename> file
+ <footnote>
+ <para>512*16=8192 bytes, exactly the size of
+ <filename>boot</filename></para></footnote>.
+ We will see more details about how <filename>boot</filename> is
+ formed from <filename>boot1</filename> and
+ <filename>boot2</filename> in the next section.</para>
+
+ <para>Recall that <literal>nread</literal> uses memory address
+ <literal>0x8c00</literal> as the transfer buffer to hold the
+ sectors read. This address is conveniently chosen. Indeed,
+ because <filename>boot1</filename> belongs to the first 512
+ bytes, it ends up in the address range
+ <literal>0x8c00</literal>-<literal>0x8dff</literal>. The 512
+ bytes that follows (range
+ <literal>0x8e00</literal>-<literal>0x8fff</literal>) is used to
+ store the <emphasis>bsdlabel</emphasis>
+ <footnote>
+ <para>Historically known as <quote>disklabel</quote>. If you
+ ever wondered where &os; stored this information, it is in
+ this region. See &man.bsdlabel.8;</para></footnote>.</para>
+
+ <para>Starting at address <literal>0x9000</literal> is the
+ beginning of the <acronym>BTX</acronym> server, and immediately
+ following is the <filename>boot2</filename> client. The
+ <acronym>BTX</acronym> server acts as a kernel, and executes in
+ protected mode in the most privileged level. In contrast, the
+ <acronym>BTX</acronym> clients (<filename>boot2</filename>, for
+ example), execute in user mode. We will see how this is
+ accomplished in the next section. The code after the call to
+ <literal>nread</literal> locates the beginning of
+ <filename>boot2</filename> in the memory buffer, and copies it
+ to memory address <literal>0xc000</literal>. This is because
+ the <acronym>BTX</acronym> server arranges
+ <filename>boot2</filename> to execute in a segment starting at
+ <literal>0xa000</literal>. We explore this in detail in the
+ following section.</para>
+
+ <para>The last code block of <filename>boot1</filename> enables
+ access to memory above 1MB
+ <footnote>
+ <para>This is necessary for legacy reasons. Interested
+ readers should see <link
+ xlink:href="http://en.wikipedia.org/wiki/A20_line"/>.</para></footnote>
+ and concludes with a jump to the starting point of the
+ <acronym>BTX</acronym> server:</para>
+
+ <figure xml:id="boot-boot1-seta20">
+ <title><filename>sys/boot/i386/boot2/boot1.S</filename></title>
+
+ <programlisting>seta20:
+ cli # Disable interrupts
+seta20.1:
+ dec %cx # Timeout?
+ jz seta20.3 # Yes
+
+ inb $0x64,%al # Get status
+ testb $0x2,%al # Busy?
+ jnz seta20.1 # Yes
+ movb $0xd1,%al # Command: Write
+ outb %al,$0x64 # output port
+seta20.2:
+ inb $0x64,%al # Get status
+ testb $0x2,%al # Busy?
+ jnz seta20.2 # Yes
+ movb $0xdf,%al # Enable
+ outb %al,$0x60 # A20
+seta20.3:
+ sti # Enable interrupts
+ jmp 0x9010 # Start BTX</programlisting>
+ </figure>
+
+ <para>Note that right before the jump, interrupts are
+ enabled.</para>
+ </sect1>
+
+ <sect1 xml:id="btx-server">
+ <title>The <acronym>BTX</acronym> Server</title>
+
+ <para>Next in our boot sequence is the
+ <acronym>BTX</acronym> Server. Let us quickly remember how we
+ got here:</para>
<itemizedlist>
<listitem>
- <para>modifies the bootable flag for the selected partition to
- make it bootable, and clears the previous</para>
+ <para>The <acronym>BIOS</acronym> loads the absolute sector
+ one (the <acronym>MBR</acronym>, or
+ <filename>boot0</filename>), to address
+ <literal>0x7c00</literal> and jumps there.</para>
</listitem>
<listitem>
- <para>saves itself to disk to remember what partition (slice)
- has been selected so to use it as the default on the next
- boot</para>
+ <para><filename>boot0</filename> relocates itself to
+ <literal>0x600</literal>, the address it was linked to
+ execute, and jumps over there. It then reads the first
+ sector of the &os; slice (which consists of
+ <filename>boot1</filename>) into address
+ <literal>0x7c00</literal> and jumps over there.</para>
</listitem>
<listitem>
- <para>loads the first sector of the selected partition (slice)
- into memory and jumps there</para>
+ <para><filename>boot1</filename> loads the first 16 sectors
+ of the &os; slice into address <literal>0x8c00</literal>.
+ This 16 sectors, or 8192 bytes, is the whole file
+ <filename>boot</filename>. The file is a
+ concatenation of <filename>boot1</filename> and
+ <filename>boot2</filename>. <filename>boot2</filename>, in
+ turn, contains the <acronym>BTX</acronym> server and the
+ <filename>boot2</filename> client. Finally, a jump is made
+ to address <literal>0x9010</literal>, the entry point of the
+ <acronym>BTX</acronym> server.</para>
</listitem>
</itemizedlist>
- <para>What kind of data should reside on the very first sector of
- a bootable partition (slice), in our case, a FreeBSD slice? As
- you may have already guessed, it is
- <filename>boot2</filename>.</para>
- </sect1>
-
- <sect1 xml:id="boot-boot2">
- <title><literal>boot2</literal> Stage</title>
-
- <para>You might wonder, why <literal>boot2</literal> comes after
- <literal>boot0</literal>, and not boot1. Actually, there is a
- 512-byte file called <filename>boot1</filename> in the directory
- <filename>/boot</filename> as well. It is used for booting from
- a floppy. When booting from a floppy,
- <filename>boot1</filename> plays the same role as
- <filename>boot0</filename> for a harddisk: it locates
- <filename>boot2</filename> and runs it.</para>
-
- <para>You may have realized that a file
- <filename>/boot/mbr</filename> exists as well. It is a
- simplified version of <filename>boot0</filename>. The code in
- <filename>mbr</filename> does not provide a menu for the user,
- it just blindly boots the partition marked active.</para>
-
- <para>The code implementing <filename>boot2</filename> resides in
- <filename>sys/boot/i386/boot2/</filename>, and the executable
- itself is in <filename>/boot</filename>. The files
- <filename>boot0</filename> and <filename>boot2</filename> that
- are in <filename>/boot</filename> are not used by the bootstrap,
- but by utilities such as <application>boot0cfg</application>.
- The actual position for <filename>boot0</filename> is in the
- MBR. For <filename>boot2</filename> it is the beginning of a
- bootable FreeBSD slice. These locations are not under the
- filesystem's control, so they are invisible to commands like
- <application>ls</application>.</para>
-
- <para>The main task for <literal>boot2</literal> is to load the
- file <filename>/boot/loader</filename>, which is the third stage
- in the bootstrapping procedure. The code in
- <literal>boot2</literal> cannot use any services like
- <function>open()</function> and <function>read()</function>,
- since the kernel is not yet loaded. It must scan the harddisk,
- knowing about the filesystem structure, find the file
- <filename>/boot/loader</filename>, read it into memory using a
- BIOS service, and then pass the execution to the loader's entry
- point.</para>
-
- <para>Besides that, <literal>boot2</literal> prompts for user
- input so the loader can be booted from different disk, unit,
- slice and partition.</para>
-
- <para>The <literal>boot2</literal> binary is created in special
- way:</para>
-
- <programlisting><filename>sys/boot/i386/boot2/Makefile:</filename>
-boot2.ld: boot2.ldr boot2.bin ${BTXKERN}
- btxld -v -E ${ORG2} -f bin -b ${BTXKERN} -l boot2.ldr \
- -o ${.TARGET} -P 1 boot2.bin</programlisting>
-
- <indexterm><primary>BTX</primary></indexterm>
- <para>This Makefile snippet shows that &man.btxld.8; is used to
- link the binary. BTX, which stands for BooT eXtender, is a
- piece of code that provides a protected mode environment for the
- program, called the client, that it is linked with. So
- <literal>boot2</literal> is a BTX client, i.e., it uses the
- service provided by BTX.</para>
-
- <indexterm><primary>linker</primary></indexterm>
- <para>The <application>btxld</application> utility is the linker.
- It links two binaries together. The difference between
- &man.btxld.8; and &man.ld.1; is that
- <application>ld</application> usually links object files into a
- shared object or executable, while
- <application>btxld</application> links an object file with the
- BTX, producing the binary file suitable to be put on the
- beginning of the partition for the system boot.</para>
-
- <para><literal>boot0</literal> passes the execution to BTX's entry
- point. BTX then switches the processor to protected mode, and
- prepares a simple environment before calling the client. This
- includes:</para>
+ <para>Before studying the <acronym>BTX</acronym> Server in detail,
+ let us further review how the single, all-in-one
+ <filename>boot</filename> file is created. The way
+ <filename>boot</filename> is built is defined in its
+ <filename>Makefile</filename>
+ (<filename>/usr/src/sys/boot/i386/boot2/Makefile</filename>).
+ Let us look at the rule that creates the
+ <filename>boot</filename> file:</para>
+
+ <figure xml:id="boot-boot1-make-boot">
+ <title><filename>sys/boot/i386/boot2/Makefile</filename></title>
+
+ <programlisting> boot: boot1 boot2
+ cat boot1 boot2 > boot</programlisting>
+ </figure>
+
+ <para>This tells us that <filename>boot1</filename> and
+ <filename>boot2</filename> are needed, and the rule simply
+ concatenates them to produce a single file called
+ <filename>boot</filename>. The rules for creating
+ <filename>boot1</filename> are also quite simple:</para>
+
+ <figure xml:id="boot-boot1-make-boot1">
+ <title><filename>sys/boot/i386/boot2/Makefile</filename></title>
+
+ <programlisting> boot1: boot1.out
+ objcopy -S -O binary boot1.out boot1
+
+ boot1.out: boot1.o
+ ld -e start -Ttext 0x7c00 -o boot1.out boot1.o</programlisting>
+ </figure>
+
+ <para>To apply the rule for creating
+ <filename>boot1</filename>, <filename>boot1.out</filename> must
+ be resolved. This, in turn, depends on the existence of
+ <filename>boot1.o</filename>. This last file is simply the
+ result of assembling our familiar <filename>boot1.S</filename>,
+ without linking. Now, the rule for creating
+ <filename>boot1.out</filename> is applied. This tells us that
+ <filename>boot1.o</filename> should be linked with
+ <literal>start</literal> as its entry point, and starting at
+ address <literal>0x7c00</literal>. Finally,
+ <filename>boot1</filename> is created from
+ <filename>boot1.out</filename> applying the appropriate rule.
+ This rule is the <filename>objcopy</filename> command applied to
+ <filename>boot1.out</filename>. Note the flags passed to
+ <filename>objcopy</filename>: <literal>-S</literal> tells it to
+ strip all relocation and symbolic information;
+ <literal>-O binary</literal> indicates the output format, that
+ is, a simple, unformatted binary file.</para>
+
+ <para>Having <filename>boot1</filename>, let us take a look at how
+ <filename>boot2</filename> is constructed:</para>
+
+ <figure xml:id="boot-boot1-make-boot2">
+ <title><filename>sys/boot/i386/boot2/Makefile</filename></title>
+
+ <programlisting> boot2: boot2.ld
+ @set -- `ls -l boot2.ld`; x=$$((7680-$$5)); \
+ echo "$$x bytes available"; test $$x -ge 0
+ dd if=boot2.ld of=boot2 obs=7680 conv=osync
+
+ boot2.ld: boot2.ldr boot2.bin ../btx/btx/btx
+ btxld -v -E 0x2000 -f bin -b ../btx/btx/btx -l boot2.ldr \
+ -o boot2.ld -P 1 boot2.bin
+
+ boot2.ldr:
+ dd if=/dev/zero of=boot2.ldr bs=512 count=1
+
+ boot2.bin: boot2.out
+ objcopy -S -O binary boot2.out boot2.bin
+
+ boot2.out: ../btx/lib/crt0.o boot2.o sio.o
+ ld -Ttext 0x2000 -o boot2.out
+
+ boot2.o: boot2.s
+ ${CC} ${ACFLAGS} -c boot2.s
+
+ boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c
+ ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c
+ sed -e '/align/d' -e '/nop/d' "MISSING" boot2.s.tmp > boot2.s
+ rm -f boot2.s.tmp
+
+ boot2.h: boot1.out
+ ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \
+ { x = $$1 - ORG1; \
+ printf("#define XREADORG %#x\n", REL1 + x) }' \
+ ORG1=`printf "%d" ${ORG1}` \
+ REL1=`printf "%d" ${REL1}` > ${.TARGET}</programlisting>
+ </figure>
+
+ <para>The mechanism for building <filename>boot2</filename> is
+ far more elaborate. Let us point out the most relevant facts.
+ The dependency list is as follows:</para>
+
+ <figure xml:id="boot-boot1-make-boot2-more">
+ <title><filename>sys/boot/i386/boot2/Makefile</filename></title>
+
+ <programlisting> boot2: boot2.ld
+ boot2.ld: boot2.ldr boot2.bin ${BTXDIR}/btx/btx
+ boot2.bin: boot2.out
+ boot2.out: ${BTXDIR}/lib/crt0.o boot2.o sio.o
+ boot2.o: boot2.s
+ boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c
+ boot2.h: boot1.out</programlisting>
+ </figure>
+
+ <para>Note that initially there is no header file
+ <filename>boot2.h</filename>, but its creation depends on
+ <filename>boot1.out</filename>, which we already have. The rule
+ for its creation is a bit terse, but the important thing is that
+ the output, <filename>boot2.h</filename>, is something like
+ this:</para>
+
+ <figure xml:id="boot-boot1-make-boot2h">
+ <title><filename>sys/boot/i386/boot2/boot2.h</filename></title>
+
+ <programlisting>
+ #define XREADORG 0x725</programlisting>
+ </figure>
+
+ <para>Recall that <filename>boot1</filename> was relocated (i.e.,
+ copied from <literal>0x7c00</literal> to
+ <literal>0x700</literal>). This relocation will now make sense,
+ because as we will see, the <acronym>BTX</acronym> server
+ reclaims some memory, including the space where
+ <filename>boot1</filename> was originally loaded. However, the
+ <acronym>BTX</acronym> server needs access to
+ <filename>boot1</filename>'s <literal>xread</literal> function;
+ this function, according to the output of
+ <filename>boot2.h</filename>, is at location
+ <literal>0x725</literal>. Indeed, the
+ <acronym>BTX</acronym> server uses the
+ <literal>xread</literal> function from
+ <filename>boot1</filename>'s relocated code. This function is
+ now accesible from within the <filename>boot2</filename>
+ client.</para>
+
+ <para>We next build <filename>boot2.s</filename> from files
+ <filename>boot2.h</filename>, <filename>boot2.c</filename> and
+ <filename>/usr/src/sys/boot/common/ufsread.c</filename>. The
+ rule for this is to compile the code in
+ <filename>boot2.c</filename> (which includes
+ <filename>boot2.h</filename> and <filename>ufsread.c</filename>)
+ into assembly code. Having <filename>boot2.s</filename>, the
+ next rule assembles <filename>boot2.s</filename>, creating the
+ object file <filename>boot2.o</filename>. The
+ next rule directs the linker to link various files
+ (<filename>crt0.o</filename>,
+ <filename>boot2.o</filename> and <filename>sio.o</filename>).
+ Note that the output file, <filename>boot2.out</filename>, is
+ linked to execute at address <literal>0x2000</literal>. Recall
+ that <filename>boot2</filename> will be executed in user mode,
+ within a special user segment set up by the
+ <acronym>BTX</acronym> server. This segment starts at
+ <literal>0xa000</literal>. Also, remember that the
+ <filename>boot2</filename> portion of <filename>boot</filename>
+ was copied to address <literal>0xc000</literal>, that is, offset
+ <literal>0x2000</literal> from the start of the user segment, so
+ <filename>boot2</filename> will work properly when we transfer
+ control to it. Next, <filename>boot2.bin</filename> is created
+ from <filename>boot2.out</filename> by stripping its symbols and
+ format information; boot2.bin is a <emphasis>raw</emphasis>
+ binary. Now, note that a file <filename>boot2.ldr</filename> is
+ created as a 512-byte file full of zeros. This space is
+ reserved for the bsdlabel.</para>
+
+ <para>Now that we have files <filename>boot1</filename>,
+ <filename>boot2.bin</filename> and
+ <filename>boot2.ldr</filename>, only the
+ <acronym>BTX</acronym> server is missing before creating the
+ all-in-one <filename>boot</filename> file. The
+ <acronym>BTX</acronym> server is located in
+ <filename>/usr/src/sys/boot/i386/btx/btx</filename>; it has its
+ own <filename>Makefile</filename> with its own set of rules for
+ building. The important thing to notice is that it is also
+ compiled as a <emphasis>raw</emphasis> binary, and that it is
+ linked to execute at address <literal>0x9000</literal>. The
+ details can be found in
+ <filename>/usr/src/sys/boot/i386/btx/btx/Makefile</filename>.</para>
+
+ <para>Having the files that comprise the <filename>boot</filename>
+ program, the final step is to <emphasis>merge</emphasis> them.
+ This is done by a special program called
+ <filename>btxld</filename> (source located in
+ <filename>/usr/src/usr.sbin/btxld</filename>). Some arguments
+ to this program include the name of the output file
+ (<filename>boot</filename>), its entry point
+ (<literal>0x2000</literal>) and its file format
+ (raw binary). The various files are
+ finally merged by this utility into the file
+ <filename>boot</filename>, which consists of
+ <filename>boot1</filename>, <filename>boot2</filename>, the
+ <literal>bsdlabel</literal> and the
+ <acronym>BTX</acronym> server. This file, which takes
+ exactly 16 sectors, or 8192 bytes, is what is
+ actually written to the beginning of the &os; slice
+ during instalation. Let us now proceed to study the
+ <acronym>BTX</acronym> server program.</para>
+
+ <para>The <acronym>BTX</acronym> server prepares a simple
+ environment and switches from 16-bit real mode to 32-bit
+ protected mode, right before passing control to the client.
+ This includes initializing and updating the following data
+ structures:</para>
<indexterm><primary>virtual v86 mode</primary></indexterm>
<itemizedlist>
<listitem>
- <para>virtual v86 mode. That means, the BTX is a v86 monitor.
- Real mode instructions like pushf, popf, cli, sti, if called
- by the client, will work.</para>
+ <para>Modifies the
+ <literal>Interrupt Vector Table (IVT)</literal>. The
+ <acronym>IVT</acronym> provides exception and interrupt
+ handlers for Real-Mode code.</para>
+ </listitem>
+
+ <listitem>
+ <para>The <literal>Interrupt Descriptor Table (IDT)</literal>
+ is created. Entries are provided for processor exceptions,
+ hardware interrupts, two system calls and V86 interface.
+ The IDT provides exception and interrupt handlers for
+ Protected-Mode code.</para>
</listitem>
<listitem>
- <para>Interrupt Descriptor Table (IDT) is set up so all
- hardware interrupts are routed to the default BIOS's
- handlers, and interrupt 0x30 is set up to be the syscall
- gate.</para>
+ <para>A <literal>Task-State Segment (TSS)</literal> is
+ created. This is necessary because the processor works in
+ the <emphasis>least</emphasis> privileged level when
+ executing the client (<filename>boot2</filename>), but in
+ the <emphasis>most</emphasis> privileged level when
+ executing the <acronym>BTX</acronym> server.</para>
</listitem>
<listitem>
- <para>Two system calls: <function>exec</function> and
- <function>exit</function>, are defined:</para>
-
- <programlisting><filename>sys/boot/i386/btx/lib/btxsys.s:</filename>
- .set INT_SYS,0x30 # Interrupt number
-#
-# System call: exit
-#
-__exit: xorl %eax,%eax # BTX system
- int $INT_SYS # call 0x0
-#
-# System call: exec
-#
-__exec: movl $0x1,%eax # BTX system
- int $INT_SYS # call 0x1</programlisting>
+ <para>The <acronym>GDT</acronym> (Global Descriptor Table) is
+ set up. Entries (descriptors) are provided for
+ supervisor code and data, user code and data, and real-mode
+ code and data.
+ <footnote>
+ <para>Real-mode code and data are necessary when switching
+ back to real mode from protected mode, as suggested by
+ the Intel manuals.</para></footnote></para>
</listitem>
</itemizedlist>
- <para>BTX creates a Global Descriptor Table (GDT):</para>
-
- <programlisting><filename>sys/boot/i386/btx/btx/btx.s:</filename>
-gdt: .word 0x0,0x0,0x0,0x0 # Null entry
- .word 0xffff,0x0,0x9a00,0xcf # SEL_SCODE
- .word 0xffff,0x0,0x9200,0xcf # SEL_SDATA
- .word 0xffff,0x0,0x9a00,0x0 # SEL_RCODE
- .word 0xffff,0x0,0x9200,0x0 # SEL_RDATA
- .word 0xffff,MEM_USR,0xfa00,0xcf# SEL_UCODE
- .word 0xffff,MEM_USR,0xf200,0xcf# SEL_UDATA
- .word _TSSLM,MEM_TSS,0x8900,0x0 # SEL_TSS</programlisting>
-
- <para>The client's code and data start from address MEM_USR
- (0xa000), and a selector (SEL_UCODE) points to the client's code
- segment. The SEL_UCODE descriptor has Descriptor Privilege
- Level (DPL) 3, which is the lowest privilege level. But the
- <literal>INT 0x30</literal> instruction handler resides in a
- segment pointed to by the SEL_SCODE (supervisor code) selector,
- as shown from the code that creates an IDT:</para>
-
- <programlisting> mov $SEL_SCODE,%dh # Segment selector
+ <para>Let us now start studying the actual implementation. Recall
+ that <filename>boot1</filename> made a jump to address
+ <literal>0x9010</literal>, the <acronym>BTX</acronym> server's
+ entry point. Before studying program execution there,
+ note that the <acronym>BTX</acronym> server has a special header
+ at address range <literal>0x9000-0x900f</literal>, right before
+ its entry point. This header is defined as follows:</para>
+
+ <figure xml:id="btx-header">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>start: # Start of code
+/*
+ * BTX header.
+ */
+btx_hdr: .byte 0xeb # Machine ID
+ .byte 0xe # Header size
+ .ascii "BTX" # Magic
+ .byte 0x1 # Major version
+ .byte 0x2 # Minor version
+ .byte BTX_FLAGS # Flags
+ .word PAG_CNT-MEM_ORG>>0xc # Paging control
+ .word break-start # Text size
+ .long 0x0 # Entry address</programlisting>
+ </figure>
+
+ <para>Note the first two bytes are <literal>0xeb</literal> and
+ <literal>0xe</literal>. In the IA-32 architecture, these two
+ bytes are interpreted as a relative jump past the header into
+ the entry point, so in theory, <filename>boot1</filename> could
+ jump here (address <literal>0x9000</literal>) instead of address
+ <literal>0x9010</literal>. Note that the last field in the
+ <acronym>BTX</acronym> header is a pointer to the client's
+ (<filename>boot2</filename>) entry point. This field is patched
+ at link time.</para>
+
+ <para>Immediately following the header is the
+ <acronym>BTX</acronym> server's entry point:</para>
+
+ <figure xml:id="btx-init">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Initialization routine.
+ */
+init: cli # Disable interrupts
+ xor %ax,%ax # Zero/segment
+ mov %ax,%ss # Set up
+ mov $0x1800,%sp # stack
+ mov %ax,%es # Address
+ mov %ax,%ds # data
+ pushl $0x2 # Clear
+ popfl # flags</programlisting>
+ </figure>
+
+ <para>This code disables interrupts, sets up a working stack
+ (starting at address <literal>0x1800</literal>) and clears the
+ flags in the EFLAGS register. Note that the
+ <literal>popfl</literal> instruction pops out a doubleword (4
+ bytes) from the stack and places it in the EFLAGS register.
+ Because the value actually popped is <literal>2</literal>, the
+ EFLAGS register is effectively cleared (IA-32 requires that bit
+ 2 of the EFLAGS register always be 1).</para>
+
+ <para>Our next code block clears (sets to <literal>0</literal>)
+ the memory range <literal>0x5e00-0x8fff</literal>. This range
+ is where the various data structures will be created:</para>
+
+ <figure xml:id="btx-clear-mem">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Initialize memory.
+ */
+ mov $0x5e00,%di # Memory to initialize
+ mov $(0x9000-0x5e00)/2,%cx # Words to zero
+ rep # Zero-fill
+ stosw # memory</programlisting>
+ </figure>
+
+ <para>Recall that <filename>boot1</filename> was originally loaded
+ to address <literal>0x7c00</literal>, so, with this memory
+ initialization, that copy effectively dissapeared. However,
+ also recall that <filename>boot1</filename> was relocated to
+ <literal>0x700</literal>, so <emphasis>that</emphasis> copy is
+ still in memory, and the <acronym>BTX</acronym> server will make
+ use of it.</para>
+
+ <para>Next, the real-mode <acronym>IVT</acronym> (Interrupt Vector
+ Table is updated. The <acronym>IVT</acronym> is an array of
+ segment/offset pairs for exception and interrupt handlers. The
+ <acronym>BIOS</acronym> normally maps hardware interrupts to
+ interrupt vectors <literal>0x8</literal> to
+ <literal>0xf</literal> and <literal>0x70</literal> to
+ <literal>0x77</literal> but, as will be seen, the 8259A
+ Programmable Interrupt Controller, the chip controlling the
+ actual mapping of hardware interrupts to interrupt vectors, is
+ programmed to remap these interrupt vectors from
+ <literal>0x8-0xf</literal> to <literal>0x20-0x27</literal> and
+ from <literal>0x70-0x77</literal> to
+ <literal>0x28-0x2f</literal>. Thus, interrupt handlers are
+ provided for interrupt vectors <literal>0x20-0x2f</literal>.
+ The reason the <acronym>BIOS</acronym>-provided handlers are not
+ used directly is because they work in 16-bit real mode, but not
+ 32-bit protected mode. Processor mode will be switched to
+ 32-bit protected mode shortly. However, the
+ <acronym>BTX</acronym> server sets up a mechanism to effectively
+ use the handlers provided by the <acronym>BIOS</acronym>:</para>
+
+ <figure xml:id="btx-ivt">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Update real mode IDT for reflecting hardware interrupts.
+ */
+ mov $intr20,%bx # Address first handler
+ mov $0x10,%cx # Number of handlers
+ mov $0x20*4,%di # First real mode IDT entry
+init.0: mov %bx,(%di) # Store IP
+ inc %di # Address next
+ inc %di # entry
+ stosw # Store CS
+ add $4,%bx # Next handler
+ loop init.0 # Next IRQ</programlisting>
+ </figure>
+
+ <para>The next block creates the <acronym>IDT</acronym> (Interrupt
+ Descriptor Table). The <acronym>IDT</acronym> is analogous, in
+ protected mode, to the <acronym>IVT</acronym> in real mode.
+ That is, the <acronym>IDT</acronym> describes the various
+ exception and interrupt handlers used when the processor is
+ executing in protected mode. In essence, it also consists of an
+ array of segment/offset pairs, although the structure is
+ somewhat more complex, because segments in protected mode are
+ different than in real mode, and various protection mechanisms
+ apply:</para>
+
+ <figure xml:id="btx-idt">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Create IDT.
+ */
+ mov $0x5e00,%di # IDT's address
+ mov $idtctl,%si # Control string
+init.1: lodsb # Get entry
+ cbw # count
+ xchg %ax,%cx # as word
+ jcxz init.4 # If done
+ lodsb # Get segment
+ xchg %ax,%dx # P:DPL:type
+ lodsw # Get control
+ xchg %ax,%bx # set
+ lodsw # Get handler offset
+ mov $SEL_SCODE,%dh # Segment selector
init.2: shr %bx # Handle this int?
jnc init.3 # No
mov %ax,(%di) # Set handler offset
mov %dh,0x2(%di) # and selector
mov %dl,0x5(%di) # Set P:DPL:type
- add $0x4,%ax # Next handler</programlisting>
+ add $0x4,%ax # Next handler
+init.3: lea 0x8(%di),%di # Next entry
+ loop init.2 # Till set done
+ jmp init.1 # Continue</programlisting>
+ </figure>
+
+ <para>Each entry in the <literal>IDT</literal> is 8 bytes long.
+ Besides the segment/offset information, they also describe the
+ segment type, privilege level, and whether the segment is
+ present in memory or not. The construction is such that
+ interrupt vectors from <literal>0</literal> to
+ <literal>0xf</literal> (exceptions) are handled by function
+ <literal>intx00</literal>; vector <literal>0x10</literal> (also
+ an exception) is handled by <literal>intx10</literal>; hardware
+ interrupts, which are later configured to start at interrupt
+ vector <literal>0x20</literal> all the way to interrupt vector
+ <literal>0x2f</literal>, are handled by function
+ <literal>intx20</literal>. Lastly, interrupt vector
+ <literal>0x30</literal>, which is used for system calls, is
+ handled by <literal>intx30</literal>, and vectors
+ <literal>0x31</literal> and <literal>0x32</literal> are handled
+ by <literal>intx31</literal>. It must be noted that only
+ descriptors for interrupt vectors <literal>0x30</literal>,
+ <literal>0x31</literal> and <literal>0x32</literal> are given
+ privilege level 3, the same privilege level as the
+ <filename>boot2</filename> client, which means the client can
+ execute a software-generated interrupt to this vectors through
+ the <literal>int</literal> instruction without failing (this is
+ the way <filename>boot2</filename> use the services provided by
+ the <acronym>BTX</acronym> server). Also, note that
+ <emphasis>only</emphasis> software-generated interrupts are
+ protected from code executing in lesser privilege levels.
+ Hardware-generated interrupts and processor-generated exceptions
+ are <emphasis>always</emphasis> handled adequately, regardless
+ of the actual privileges involved.</para>
+
+ <para>The next step is to initialize the <acronym>TSS</acronym>
+ (Task-State Segment). The <acronym>TSS</acronym> is a hardware
+ feature that helps the operating system or executive software
+ implement multitasking functionality through process
+ abstraction. The IA-32 architecture demands the creation and
+ use of <emphasis>at least</emphasis> one <acronym>TSS</acronym>
+ if multitasking facilities are used or different privilege
+ levels are defined. Because the <filename>boot2</filename>
+ client is executed in privilege level 3, but the
+ <acronym>BTX</acronym> server does in privilege level 0, a
+ <acronym>TSS</acronym> must be defined:</para>
+
+ <figure xml:id="btx-tss">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Initialize TSS.
+ */
+init.4: movb $_ESP0H,TSS_ESP0+1(%di) # Set ESP0
+ movb $SEL_SDATA,TSS_SS0(%di) # Set SS0
+ movb $_TSSIO,TSS_MAP(%di) # Set I/O bit map base</programlisting>
+ </figure>
+
+ <para>Note that a value is given for the Privilege Level 0 stack
+ pointer and stack segment in the <acronym>TSS</acronym>. This is needed because,
+ if an interrupt or exception is received while executing
+ <filename>boot2</filename> in Privilege Level 3, a change to
+ Privilege Level 0 is automatically performed by the processor,
+ so a new working stack is needed. Finally, the I/O Map Base
+ Address field of the <acronym>TSS</acronym> is given a value, which is a 16-bit
+ offset from the beginning of the <acronym>TSS</acronym> to the I/O Permission
+ Bitmap and the Interrupt Redirection Bitmap.</para>
+
+ <para>After the <acronym>IDT</acronym> and <acronym>TSS</acronym> are created, the processor is ready to
+ switch to protected mode. This is done in the next
+ block:</para>
+
+ <figure xml:id="btx-prot">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Bring up the system.
+ */
+ mov $0x2820,%bx # Set protected mode
+ callw setpic # IRQ offsets
+ lidt idtdesc # Set IDT
+ lgdt gdtdesc # Set GDT
+ mov %cr0,%eax # Switch to protected
+ inc %ax # mode
+ mov %eax,%cr0 #
+ ljmp $SEL_SCODE,$init.8 # To 32-bit code
+ .code32
+init.8: xorl %ecx,%ecx # Zero
+ movb $SEL_SDATA,%cl # To 32-bit
+ movw %cx,%ss # stack</programlisting>
+ </figure>
+
+ <para>First, a call is made to <literal>setpic</literal> to
+ program the 8259A <acronym>PIC</acronym> (Programmable Interrupt Controller).
+ This chip is connected to multiple hardware interrupt sources.
+ Upon receiving an interrupt from a device, it
+ signals the processor with the appropriate interrupt vector.
+ This can be customized so that specific interrupts are
+ associated with specific interrupt vectors, as explained before.
+ Next, the <acronym>IDTR</acronym> (Interrupt Descriptor Table Register) and
+ <acronym>GDTR</acronym> (Global Descriptor Table Register) are loaded with the
+ instructions <literal>lidt</literal> and <literal>lgdt</literal>, respectively. These registers are
+ loaded with the base address and limit address for the <acronym>IDT</acronym> and
+ <acronym>GDT</acronym>. The following three instructions set the Protection Enable
+ (PE) bit of the <literal>%cr0</literal> register. This
+ effectively switches the processor to
+ 32-bit protected mode. Next, a long jump is made to
+ <literal>init.8</literal> using segment selector SEL_SCODE,
+ which selects the Supervisor Code Segment. The processor is
+ effectively executing in CPL 0, the most privileged level, after
+ this jump. Finally, the Supervisor Data Segment is selected for
+ the stack by assigning the segment selector SEL_SDATA to the
+ <literal>%ss</literal> register. This data segment also has a
+ privilege level of <literal>0</literal>.</para>
+
+ <para>Our last code block is responsible for loading the
+ <acronym>TR</acronym> (Task Register) with the segment selector for the <acronym>TSS</acronym> we created
+ earlier, and setting the User Mode environment before passing
+ execution control to the <filename>boot2</filename>
+ client.</para>
+
+ <figure xml:id="btx-end">
+ <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title>
+
+ <programlisting>/*
+ * Launch user task.
+ */
+ movb $SEL_TSS,%cl # Set task
+ ltr %cx # register
+ movl $0xa000,%edx # User base address
+ movzwl %ss:BDA_MEM,%eax # Get free memory
+ shll $0xa,%eax # To bytes
+ subl $ARGSPACE,%eax # Less arg space
+ subl %edx,%eax # Less base
+ movb $SEL_UDATA,%cl # User data selector
+ pushl %ecx # Set SS
+ pushl %eax # Set ESP
+ push $0x202 # Set flags (IF set)
+ push $SEL_UCODE # Set CS
+ pushl btx_hdr+0xc # Set EIP
+ pushl %ecx # Set GS
+ pushl %ecx # Set FS
+ pushl %ecx # Set DS
+ pushl %ecx # Set ES
+ pushl %edx # Set EAX
+ movb $0x7,%cl # Set remaining
+init.9: push $0x0 # general
+ loop init.9 # registers
+ popa # and initialize
+ popl %es # Initialize
+ popl %ds # user
+ popl %fs # segment
+ popl %gs # registers
+ iret # To user mode</programlisting>
+ </figure>
+
+ <para>Note that the client's environment include a stack segment
+ selector and stack pointer (registers <literal>%ss</literal> and
+ <literal>%esp</literal>). Indeed, once the <acronym>TR</acronym> is loaded with
+ the appropriate stack segment selector (instruction
+ <literal>ltr</literal>), the stack pointer is calculated and
+ pushed onto the stack along with the stack's segment selector.
+ Next, the value <literal>0x202</literal> is pushed onto the
+ stack; it is the value that the EFLAGS will get when control is
+ passed to the client. Also, the User Mode code segment selector
+ and the client's entry point are pushed. Recall that this entry
+ point is patched in the <acronym>BTX</acronym> header at link time. Finally,
+ segment selectors (stored in register <literal>%ecx</literal>)
+ for the segment registers
+ <literal>%gs, %fs, %ds and %es</literal> are pushed onto the
+ stack, along with the value at <literal>%edx</literal>
+ (<literal>0xa000</literal>). Keep in mind the various values
+ that have been pushed onto the stack (they will be popped out
+ shortly). Next, values for the remaining general purpose
+ registers are also pushed onto the stack (note the
+ <literal>loop</literal> that pushes the value
+ <literal>0</literal> seven times). Now, values will be started
+ to be popped out of the stack. First, the
+ <literal>popa</literal> instruction pops out of the stack the
+ latest seven values pushed. They are stored in the general
+ purpose registers in order
+ <literal>%edi, %esi, %ebp, %ebx, %edx, %ecx, %eax</literal>.
+ Then, the various segment selectors pushed are popped into the
+ various segment registers. Five values still remain on the
+ stack. They are popped when the <literal>iret</literal>
+ instruction is executed. This instruction first pops
+ the value that was pushed from the <acronym>BTX</acronym> header. This value is a
+ pointer to <filename>boot2</filename>'s entry point. It is
+ placed in the register <literal>%eip</literal>, the instruction
+ pointer register. Next, the segment selector for the User
+ Code Segment is popped and copied to register
+ <literal>%cs</literal>. Remember that
+ this segment's privilege level is 3, the least privileged
+ level. This means that we must provide values for the stack of
+ this privilege level. This is why the processor, besides
+ further popping the value for the EFLAGS register, does two more
+ pops out of the stack. These values go to the stack
+ pointer (<literal>%esp</literal>) and the stack segment
+ (<literal>%ss</literal>). Now, execution continues at
+ <literal>boot0</literal>'s entry point.</para>
+
+ <para>It is important to note how the User Code Segment is
+ defined. This segment's <emphasis>base address</emphasis> is
+ set to <literal>0xa000</literal>. This means that code memory
+ addresses are <emphasis>relative</emphasis> to address 0xa000;
+ if code being executed is fetched from address
+ <literal>0x2000</literal>, the <emphasis>actual</emphasis>
+ memory addressed is
+ <literal>0xa000+0x2000=0xc000</literal>.</para>
+ </sect1>
- <para>So, when the client calls <function>__exec()</function>, the
- code will be executed with the highest privileges. This allows
- the kernel to change the protected mode data structures, such as
- page tables, GDT, IDT, etc later, if needed.</para>
+ <sect1 xml:id="boot2">
+ <title><application>boot2</application> Stage</title>
<para><literal>boot2</literal> defines an important structure,
<literal>struct bootinfo</literal>. This structure is
@@ -416,7 +1760,7 @@ init.2: shr %bx # Handle this int?
loader, and then further to the kernel. Some nodes of this
structures are set by <literal>boot2</literal>, the rest by the
loader. This structure, among other information, contains the
- kernel filename, BIOS harddisk geometry, BIOS drive number for
+ kernel filename, <acronym>BIOS</acronym> harddisk geometry, <acronym>BIOS</acronym> drive number for
boot device, physical memory available, <literal>envp</literal>
pointer etc. The definition for it is:</para>
@@ -451,8 +1795,8 @@ struct bootinfo {
<function>ino_t lookup(char *filename)</function> and
<function>int xfsread(ino_t inode, void *buf, size_t
nbyte)</function> are used to read the content of a file into
- memory. <filename>/boot/loader</filename> is an ELF binary, but
- where the ELF header is prepended with a.out's <literal>struct
+ memory. <filename>/boot/loader</filename> is an <acronym>ELF</acronym> binary, but
+ where the <acronym>ELF</acronym> header is prepended with <filename>a.out</filename>'s <literal>struct
exec</literal> structure. <function>load()</function> scans the
loader's ELF header, loading the content of
<filename>/boot/loader</filename> into memory, and passing the
@@ -467,10 +1811,10 @@ struct bootinfo {
<sect1 xml:id="boot-loader">
<title><application>loader</application> Stage</title>
- <para><application>loader</application> is a BTX client as well.
+ <para><application>loader</application> is a <acronym>BTX</acronym> client as well.
I will not describe it here in detail, there is a comprehensive
manpage written by Mike Smith, &man.loader.8;. The underlying
- mechanisms and BTX were discussed above.</para>
+ mechanisms and <acronym>BTX</acronym> were discussed above.</para>
<para>The main task for the loader is to boot the kernel. When
the kernel is loaded into memory, it is being called by the