diff options
author | Warren Block <wblock@FreeBSD.org> | 2014-01-26 02:30:34 +0000 |
---|---|---|
committer | Warren Block <wblock@FreeBSD.org> | 2014-01-26 02:30:34 +0000 |
commit | 31d08ba8b68f57e6289b544a4c28e01cc73be9a3 (patch) | |
tree | 08651d6e017ff305ce3f6fba8d276dc3580f7252 | |
parent | 9d06fb107b9c5a204abaaf669641e2c4c7737b3c (diff) |
Notes
-rw-r--r-- | en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml | 1884 |
1 files changed, 1614 insertions, 270 deletions
diff --git a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml index e58c126978..53ab4b8579 100644 --- a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml +++ b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml @@ -4,6 +4,8 @@ The FreeBSD Documentation Project Copyright (c) 2002 Sergey Lyubka <devnull@uptsoft.com> All rights reserved +Copyright (c) 2014 Sergio Andr?s G?mez del Real <Sergio.G.delReal@gmail.com> +All rights reserved $FreeBSD$ --> @@ -25,6 +27,18 @@ $FreeBSD$ </author> <!-- devnull@uptsoft.com 12 Jun 2002 --> </authorgroup> + + <authorgroup> + <author> + <personname> + <firstname>Sergio Andrés</firstname> + <surname> Gómez del Real</surname> + </personname> + + <contrib>Updated and enhanced by </contrib> + </author> + <!-- Sergio.G.DelReal@gmail.com Jan 2014 --> + </authorgroup> </info> <sect1 xml:id="boot-synopsis"> @@ -37,88 +51,103 @@ $FreeBSD$ <indexterm><primary>booting</primary></indexterm> <indexterm><primary>system initialization</primary></indexterm> <para>This chapter is an overview of the boot and system - initialization process, starting from the BIOS (firmware) POST, - to the first user process creation. Since the initial steps of - system startup are very architecture dependent, the IA-32 - architecture is used as an example.</para> + initialization processes, starting from the <acronym>BIOS</acronym> (firmware) + <acronym>POST</acronym>, to the first user process creation. Since the initial + steps of system startup are very architecture dependent, the + IA-32 architecture is used as an example.</para> + + <para>The &os; boot process can be surprisingly complex. After + control is passed from the <acronym>BIOS</acronym>, a considerable amount of + low-level configuration must be done before the kernel can be + loaded and executed. This setup must be done in a simple and + flexible manner, allowing the user a great deal of customization + possibilities.</para> </sect1> <sect1 xml:id="boot-overview"> <title>Overview</title> - <para>A computer running FreeBSD can boot by several methods, - although the most common method, booting from a harddisk where - the OS is installed, will be discussed here. The boot process - is divided into several steps:</para> - - <itemizedlist> - <listitem><para>BIOS POST</para></listitem> - <listitem><para><literal>boot0</literal> stage</para></listitem> - <listitem><para><literal>boot2</literal> stage</para></listitem> - <listitem><para>loader stage</para></listitem> - <listitem><para>kernel initialization</para></listitem> - </itemizedlist> - - <indexterm><primary>BIOS POST</primary></indexterm> - <indexterm><primary>boot0</primary></indexterm> - <indexterm><primary>boot2</primary></indexterm> - <indexterm><primary>loader</primary></indexterm> - <para>The <literal>boot0</literal> and <literal>boot2</literal> - stages are also referred to as <emphasis>bootstrap stages 1 and - 2</emphasis> in &man.boot.8; as the first steps in FreeBSD's - 3-stage bootstrapping procedure. Various information is printed - on the screen at each stage, so you may visually recognize them - using the table that follows. Please note that the actual data + <para>The boot process is an extremely machine-dependent + activity. Not only must code be written for every computer + architecture, but there may also be multiple types of booting on + the same architecture. For example, looking at + <filename class="directory">/usr/sys/src/boot</filename> + reveals a great amount of architecture-dependent code. There is + a directory for each of the various supported architectures. In + the x86-specific <filename class="directory">i386</filename> + directory, there are subdirectories for different boot standards + like <filename>mbr</filename> (Master Boot Record), + <filename>gpt</filename> (<acronym>GUID</acronym> Partition + Table), and <filename>efi</filename> (Extensible Firmware + Interface). Each boot standard has its own conventions and data + structures. The example that follows shows booting an x86 + computer from an <acronym>MBR</acronym> hard drive with the &os; + <filename>boot0</filename> multi-boot loader stored in the very + first sector. That boot code starts the &os; three-stage boot + process.</para> + + <para>The key to understanding this process is that it is a series + of stages of increasing complexity. These stages are + <filename>boot1</filename>, <filename>boot2</filename>, and + <filename>loader</filename> (see &man.boot.8; for more detail). + The boot system executes each stage in sequence. The last + stage, <filename>loader</filename>, is responsible for loading + the &os; kernel. Each stage is examined in the following + sections.</para> + + <para>Here is an example of the output generated by the + different boot stages. Actual output may differ from machine to machine:</para> <informaltable frame="none" pgwide="0"> <tgroup cols="2"> <tbody> <row> - <entry><para>Output (may vary)</para></entry> - <entry><para>BIOS (firmware) messages</para></entry> + <entry>&os; Component</entry> + <entry>Output (may vary)</entry> </row> <row> - <entry><para><screen>F1 FreeBSD + <entry><literal>boot0</literal></entry> + <entry><screen>F1 FreeBSD F2 BSD -F5 Disk 2</screen></para></entry> - <entry><para><literal>boot0</literal></para></entry> +F5 Disk 2</screen></entry> </row> <row> - <entry><para><screen>>>FreeBSD/i386 BOOT -Default: 1:ad(1,a)/boot/loader -boot:</screen></para></entry> - <entry><para><literal>boot2</literal> + <entry><literal>boot2</literal> <footnote><para>This prompt will appear if the user presses a key just after selecting an OS to boot at the <literal>boot0</literal> - stage.</para></footnote></para></entry> + stage.</para></footnote></entry> + <entry><screen>>>FreeBSD/i386 BOOT +Default: 1:ad(1,a)/boot/loader +boot:</screen></entry> </row> <row> - <entry><para><screen>BTX loader 1.0 BTX version is 1.01 -BIOS drive A: is disk0 -BIOS drive C: is disk1 -BIOS 639kB/64512kB available memory -FreeBSD/i386 bootstrap loader, Revision 0.8 + <entry><filename>loader</filename></entry> + <entry><screen>BTX loader 1.00 BTX version is 1.02 +Consoles: internal video/keyboard +BIOS drive C: is disk0 +BIOS 639kB/2096064kB available memory + +FreeBSD/x86 bootstrap loader, Revision 1.1 Console internal video/keyboard -(jkh@bento.freebsd.org, Mon Nov 20 11:41:23 GMT 2000) -/kernel text=0x1234 data=0x2345 syms=[0x4+0x3456] -Hit [Enter] to boot immediately, or any other key for command prompt -Booting [kernel] in 9 seconds..._</screen></para></entry> - <entry><para>loader</para></entry> +(root@snap.freebsd.org, Thu Jan 16 22:18:05 UTC 2014) +Loading /boot/defaults/loader.conf +/boot/kernel/kernel text=0xed9008 data=0x117d28+0x176650 syms=[0x8+0x137988+0x8+0x1515f8]</screen></entry> </row> <row> - <entry><para><screen>Copyright (c) 1992-2002 The FreeBSD Project. + <entry>kernel</entry> + <entry><screen>Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. -FreeBSD 4.6-RC #0: Sat May 4 22:49:02 GMT 2002 - devnull@kukas:/usr/obj/usr/src/sys/DEVNULL -Timecounter "i8254" frequency 1193182 Hz</screen></para></entry> - <entry><para>kernel</para></entry> +FreeBSD is a registered trademark of The FreeBSD Foundation. +FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014 + root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 +FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610</screen></entry> </row> </tbody> </tgroup> @@ -126,84 +155,114 @@ Timecounter "i8254" frequency 1193182 Hz</screen></para></entry> </sect1> <sect1 xml:id="boot-bios"> - <title>BIOS POST</title> + <title>The <acronym>BIOS</acronym></title> - <para>When the PC powers on, the processor's registers are set - to some predefined values. One of the registers is the + <para>When the computer powers on, the processor's registers are + set to some predefined values. One of the registers is the <emphasis>instruction pointer</emphasis> register, and its value after a power on is well defined: it is a 32-bit value of - 0xfffffff0. The instruction pointer register points to code to - be executed by the processor. One of the registers is the + <literal>0xfffffff0</literal>. The instruction pointer register + (also known as the Program Counter) points to code to be + executed by the processor. Another important register is the <literal>cr0</literal> 32-bit control register, and its value - just after the reboot is 0. One of the cr0's bits, the bit PE - (Protection Enabled) indicates whether the processor is running - in protected or real mode. Since at boot time this bit is - cleared, the processor boots in real mode. Real mode means, + just after a reboot is <literal>0</literal>. One of + <literal>cr0</literal>'s bits, the PE (Protection Enabled) bit, + indicates whether the processor is running in 32-bit protected + mode or 16-bit real mode. Since this bit is cleared at boot + time, the processor boots in 16-bit real mode. Real mode means, among other things, that linear and physical addresses are - identical.</para> - - <para>The value of 0xfffffff0 is slightly less then 4Gb, so unless - the machine has 4Gb physical memory, it cannot point to a valid - memory address. The computer's hardware translates this address - so that it points to a BIOS memory block.</para> - - <para>BIOS stands for <emphasis>Basic Input Output - System</emphasis>, and it is a chip on the motherboard that - has a relatively small amount of read-only memory (ROM). This + identical. The reason for the processor not to start + immediately in 32-bit protected mode is backwards compatibility. + In particular, the boot process relies on the services provided + by the <acronym>BIOS</acronym>, and the <acronym>BIOS</acronym> + itself works in legacy, 16-bit code.</para> + + <para>The value of <literal>0xfffffff0</literal> is slightly less + than 4 GB, so unless the machine has 4 GB of physical + memory, it cannot point to a valid memory address. The + computer's hardware translates this address so that it points to + a <acronym>BIOS</acronym> memory block.</para> + + <para>The <acronym>BIOS</acronym> (Basic Input Output + System) is a chip on the motherboard that has a relatively small + amount of read-only memory (<acronym>ROM</acronym>). This memory contains various low-level routines that are specific to - the hardware supplied with the motherboard. So, the processor - will first jump to the address 0xfffffff0, which really resides - in the BIOS's memory. Usually this address contains a jump - instruction to the BIOS's POST routines.</para> - - <para>POST stands for <emphasis>Power On Self Test</emphasis>. - This is a set of routines including the memory check, system bus - check and other low-level stuff so that the CPU can initialize - the computer properly. The important step on this stage is - determining the boot device. All modern BIOS's allow the boot - device to be set manually, so you can boot from a floppy, - CD-ROM, harddisk etc.</para> - - <para>The very last thing in the POST is the <literal>INT - 0x19</literal> instruction. That instruction reads 512 bytes - from the first sector of boot device into the memory at address - 0x7c00. The term <emphasis>first sector</emphasis> originates - from harddrive architecture, where the magnetic plate is divided - to a number of cylindrical tracks. Tracks are numbered, and - every track is divided by a number (usually 64) sectors. Track - number 0 is the outermost on the magnetic plate, and sector 1, - the first sector (tracks, or, cylinders, are numbered starting - from 0, but sectors - starting from 1), has a special meaning. - It is also called Master Boot Record, or MBR. The remaining - sectors on the first track are never used <footnote><para>Some - utilities such as &man.disklabel.8; may store the - information in this area, mostly in the second - sector.</para></footnote>.</para> + the hardware supplied with the motherboard. The processor will + first jump to the address 0xfffffff0, which really resides in + the <acronym>BIOS</acronym>'s memory. Usually this address + contains a jump instruction to the <acronym>BIOS</acronym>'s + POST routines.</para> + + <para>The <acronym>POST</acronym> (Power On Self Test) + is a set of routines including the memory check, system bus + check, and other low-level initialization so the + <acronym>CPU</acronym> can set up the computer properly. The + important step of this stage is determining the boot device. + Modern <acronym>BIOS</acronym> implementations permit the + selection of a boot device, allowing booting from a floppy, + <acronym>CD-ROM</acronym>, hard disk, or other devices.</para> + + <para>The very last thing in the <acronym>POST</acronym> is the + <literal>INT 0x19</literal> instruction. The + <literal>INT 0x19</literal> handler reads 512 bytes from the + first sector of boot device into the memory at address + <literal>0x7c00</literal>. The term + <emphasis>first sector</emphasis> originates from hard drive + architecture, where the magnetic plate is divided into a number + of cylindrical tracks. Tracks are numbered, and every track is + divided into a number (usually 64) of sectors. Track numbers + start at 0, but sector numbers start from 1. Track 0 is the + outermost on the magnetic plate, and sector 1, the first sector, + has a special purpose. It is also called the + <acronym>MBR</acronym>, or Master Boot Record. The remaining + sectors on the first track are never used.</para> + + <para>This sector is our boot-sequence starting point. As we will + see, this sector contains a copy of our + <filename>boot0</filename> program. A jump is made by the + <acronym>BIOS</acronym> to address <literal>0x7c00</literal> so + it starts executing.</para> </sect1> <sect1 xml:id="boot-boot0"> - <title><literal>boot0</literal> Stage</title> + <title>The Master Boot Record (<literal>boot0</literal>)</title> <indexterm><primary>MBR</primary></indexterm> - <para>Take a look at the file <filename>/boot/boot0</filename>. - This is a small 512-byte file, and it is exactly what FreeBSD's - installation procedure wrote to your harddisk's MBR if you chose - the <quote>bootmanager</quote> option at installation - time.</para> + + <para>After control is received from the <acronym>BIOS</acronym> + at memory address <literal>0x7c00</literal>, + <filename>boot0</filename> starts executing. It is the first + piece of code under &os; control. The task of + <filename>boot0</filename> is quite simple: scan the partition + table and let the user choose which partition to boot from. The + Partition Table is a special, standard data structure embedded + in the <acronym>MBR</acronym> (hence embedded in + <filename>boot0</filename>) describing the four standard PC + <quote>partitions</quote> + <footnote> + <para><link + xlink:href="http://en.wikipedia.org/wiki/Master_boot_record"></link></para></footnote>. + <filename>boot0</filename> resides in the filesystem as + <filename>/boot/boot0</filename>. It is a small 512-byte file, + and it is exactly what &os;'s installation procedure wrote to + the hard disk's <acronym>MBR</acronym> if you chose the <quote>bootmanager</quote> + option at installation time. Indeed, + <filename>boot0</filename> <emphasis>is</emphasis> the + <acronym>MBR</acronym>.</para> <para>As mentioned previously, the <literal>INT 0x19</literal> - instruction loads an MBR, i.e., the <filename>boot0</filename> - content, into the memory at address 0x7c00. Taking a look at - the file <filename>sys/boot/i386/boot0/boot0.S</filename> can - give a guess at what is happening there - this is the boot - manager, which is an awesome piece of code written by Robert - Nordier.</para> - - <para>The MBR, or, <filename>boot0</filename>, has a special - structure starting from offset 0x1be, called the - <emphasis>partition table</emphasis>. It has 4 records of 16 - bytes each, called <emphasis>partition records</emphasis>, which - represent how the harddisk(s) are partitioned, or, in FreeBSD's + instruction causes the <literal>INT 0x19</literal> handler to + load an <acronym>MBR</acronym> (<filename>boot0</filename>) into + memory at address <literal>0x7c00</literal>. The source file + for <filename>boot0</filename> can be found in + <filename>sys/boot/i386/boot0/boot0.S</filename> - which is an + awesome piece of code written by Robert Nordier.</para> + + <para>A special structure starting from offset + <literal>0x1be</literal> in the <acronym>MBR</acronym> is called + the <emphasis>partition table</emphasis>. It has four records + of 16 bytes each, called <emphasis>partition records</emphasis>, + which represent how the hard disk is partitioned, or, in &os;'s terminology, sliced. One byte of those 16 says whether a partition (slice) is bootable or not. Exactly one record must have that flag set, otherwise <filename>boot0</filename>'s code @@ -229,186 +288,1471 @@ Timecounter "i8254" frequency 1193182 Hz</screen></para></entry> </listitem> </itemizedlist> - <para>A partition record descriptor has the information about + <para>A partition record descriptor contains information about where exactly the partition resides on the drive. Both - descriptors, LBA and CHS, describe the same information, but in - different ways: LBA (Logical Block Addressing) has the starting - sector for the partition and the partition's length, while CHS - (Cylinder Head Sector) has coordinates for the first and last - sectors of the partition.</para> - - <para>The boot manager scans the partition table and prints the - menu on the screen so the user can select what disk and what - slice to boot. By pressing an appropriate key, - <filename>boot0</filename> performs the following - actions:</para> + descriptors, <acronym>LBA</acronym> and <acronym>CHS</acronym>, + describe the same information, but in different ways: + <acronym>LBA</acronym> (Logical Block Addressing) has the + starting sector for the partition and the partition's length, + while <acronym>CHS</acronym> (Cylinder Head Sector) has + coordinates for the first and last sectors of the partition. + The partition table ends with the special signature + <literal>0xaa55</literal>.</para> + + <para>The <acronym>MBR</acronym> must fit into 512 bytes, a single + disk sector. This program uses low-level <quote>tricks</quote> + like taking advantage of the side effects of certain + instructions and reusing register values from previous + operations to make the most out of the fewest possible + instructions. Care must also be taken when handling the + partition table, which is embedded in the <acronym>MBR</acronym> + itself. For these reasons, be very careful when modifying + <filename>boot0.S</filename>.</para> + + <para>Note that the <filename>boot0.S</filename> source file + is assembled <quote>as is</quote>: instructions are translated + one by one to binary, with no additional information (no + <acronym>ELF</acronym> file format, for example). This kind of + low-level control is achieved at link time through special + control flags passed to the linker. For example, the text + section of the program is set to be located at address + <literal>0x600</literal>. In practice this means that + <filename>boot0</filename> must be loaded to memory address + <literal>0x600</literal> in order to function properly.</para> + + <para>It is worth looking at the <filename>Makefile</filename> for + <filename>boot0</filename> + (<filename>sys/boot/i386/boot0/Makefile</filename>), as it + defines some of the run-time behavior of + <filename>boot0</filename>. For instance, if a terminal + connected to the serial port (COM1) is used for I/O, the macro + <literal>SIO</literal> must be defined + (<literal>-DSIO</literal>). <literal>-DPXE</literal> enables + boot through <acronym>PXE</acronym> by pressing + <keycap>F6</keycap>. Additionally, the program defines a set of + <emphasis>flags</emphasis> that allow further modification of + its behavior. All of this is illustrated in the + <filename>Makefile</filename>. For example, look at the + linker directives which command the linker to start the text + section at address <literal>0x600</literal>, and to build the + output file <quote>as is</quote> (strip out any file + formatting):</para> + + <figure xml:id="boot-boot0-makefile-as-is"> + <title><filename>sys/boot/i386/boot0/Makefile</filename></title> + + <programlisting> BOOT_BOOT0_ORG?=0x600 + LDFLAGS=-e start -Ttext ${BOOT_BOOT0_ORG} \ + -Wl,-N,-S,--oformat,binary</programlisting> + </figure> + + <para>Let us now start our study of the <acronym>MBR</acronym>, or + <filename>boot0</filename>, starting where execution + begins.</para> + + <note> + <para>Some modifications have been made to some instructions in + favor of better exposition. For example, some macros are + expanded, and some macro tests are omitted when the result of + the test is known. This applies to all of the code examples + shown.</para> + </note> + + <figure xml:id="boot-boot0-entrypoint"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting>start: + cld # String ops inc + xorw %ax,%ax # Zero + movw %ax,%es # Address + movw %ax,%ds # data + movw %ax,%ss # Set up + movw 0x7c00,%sp # stack</programlisting> + </figure> + + <para>This first block of code is the entry point of the program. + It is where the <acronym>BIOS</acronym> transfers control. + First, it makes sure that the string operations autoincrement + its pointer operands (the <literal>cld</literal> instruction) + <footnote> + <para>When in doubt, we refer the reader to the official Intel + manuals, which describe the exact semantics for each + instruction: <link + xlink:href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"></link>.</para></footnote>. + Then, as it makes no assumption about the state of the segment + registers, it initializes them. Finally, it sets the stack + pointer register (<literal>%sp</literal>) to address + <literal>0x7c00</literal>, so we have a working stack.</para> + + <para>The next block is responsible for the relocation and + subsequent jump to the relocated code.</para> + + <figure xml:id="boot-boot0-relocation"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting> movw $0x7c00,%si # Source + movw $0x600,%di # Destination + movw $512,%cx # Word count + rep # Relocate + movsb # code + movw %di,%bp # Address variables + movb $16,%cl # Words to clear + rep # Zero + stosb # them + incb -0xe(%di) # Set the S field to 1 + jmp main-0x7c00+0x600 # Jump to relocated code</programlisting> + </figure> + + <para>Because <filename>boot0</filename> is loaded by the + <acronym>BIOS</acronym> to address <literal>0x7C00</literal>, it + copies itself to address <literal>0x600</literal> and then + transfers control there (recall that it was linked to execute at + address <literal>0x600</literal>). The source address, + <literal>0x7c00</literal>, is copied to register + <literal>%si</literal>. The destination address, + <literal>0x600</literal>, to register <literal>%di</literal>. + The number of bytes to copy, <literal>512</literal> (the + program's size), is copied to register <literal>%cx</literal>. + Next, the <literal>rep</literal> instruction repeats the + instruction that follows, that is, <literal>movsb</literal>, the + number of times dictated by the <literal>%cx</literal> register. + The <literal>movsb</literal> instruction copies the byte pointed + to by <literal>%si</literal> to the address pointed to by + <literal>%di</literal>. This is repeated another 511 times. On + each repetition, both the source and destination registers, + <literal>%si</literal> and <literal>%di</literal>, are + incremented by one. Thus, upon completion of the 512-byte copy, + <literal>%di</literal> has the value + <literal>0x600</literal>+<literal>512</literal>= + <literal>0x800</literal>, and <literal>%si</literal> has the + value <literal>0x7c00</literal>+<literal>512</literal>= + <literal>0x7e00</literal>; we have thus completed the code + <emphasis>relocation</emphasis>.</para> + + <para>Next, the destination register + <literal>%di</literal> is copied to <literal>%bp</literal>. + <literal>%bp</literal> gets the value <literal>0x800</literal>. + The value <literal>16</literal> is copied to + <literal>%cl</literal> in preparation for a new string operation + (like our previous <literal>movsb</literal>). Now, + <literal>stosb</literal> is executed 16 times. This instruction + copies a <literal>0</literal> value to the address pointed to by + the destination register (<literal>%di</literal>, which is + <literal>0x800</literal>), and increments it. This is repeated + another 15 times, so <literal>%di</literal> ends up with value + <literal>0x810</literal>. Effectively, this clears the address + range <literal>0x800</literal>-<literal>0x80f</literal>. This + range is used as a (fake) partition table for writing the + <acronym>MBR</acronym> back to disk. Finally, the sector field + for the <acronym>CHS</acronym> addressing of this fake partition + is given the value 1 and a jump is made to the main function + from the relocated code. Note that until this jump to the + relocated code, any reference to an absolute address was + avoided.</para> + + <para>The following code block tests whether the drive number + provided by the <acronym>BIOS</acronym> should be used, or + the one stored in <filename>boot0</filename>.</para> + + <figure xml:id="boot-boot0-drivenumber"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting>main: + testb $SETDRV,-69(%bp) # Set drive number? + jnz disable_update # Yes + testb %dl,%dl # Drive number valid? + js save_curdrive # Possibly (0x80 set)</programlisting> + </figure> + + <para>This code tests the <literal>SETDRV</literal> bit + (<literal>0x20</literal>) in the <emphasis>flags</emphasis> + variable. Recall that register <literal>%bp</literal> points to + address location <literal>0x800</literal>, so the test is done + to the <emphasis>flags</emphasis> variable at address + <literal>0x800</literal>-<literal>69</literal>= + <literal>0x7bb</literal>. This is an example of the type of + modifications that can be done to <filename>boot0</filename>. + The <literal>SETDRV</literal> flag is not set by default, but it + can be set in the <filename>Makefile</filename>. When set, the + drive number stored in the <acronym>MBR</acronym> is used + instead of the one provided by the <acronym>BIOS</acronym>. We + assume the defaults, and that the <acronym>BIOS</acronym> + provided a valid drive number, so we jump to + <literal>save_curdrive</literal>.</para> + + <para>The next block saves the drive number provided by the + <acronym>BIOS</acronym>, and calls <literal>putn</literal> to + print a new line on the screen.</para> + + <figure xml:id="boot-boot0-savedrivenumber"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting>save_curdrive: + movb %dl, (%bp) # Save drive number + pushw %dx # Also in the stack +#ifdef TEST /* test code, print internal bios drive */ + rolb $1, %dl + movw $drive, %si + call putkey +#endif + callw putn # Print a newline</programlisting> + </figure> + + <para>Note that we assume <varname>TEST</varname> is not defined, + so the conditional code in it is not assembled and will not + appear in our executable <filename>boot0</filename>.</para> + + <para>Our next block implements the actual scanning of the + partition table. It prints to the screen the partition type for + each of the four entries in the partition table. It compares + each type with a list of well-known operating system file + systems. Examples of recognized partition types are + <acronym>NTFS</acronym> (&windows;, ID 0x7), + <literal>ext2fs</literal> (&linux;, ID 0x83), and, of course, + <literal>ffs</literal>/<literal>ufs2</literal> (&os;, ID 0xa5). + The implementation is fairly simple.</para> + + <figure xml:id="boot-boot0-partition-scan"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting> movw $(partbl+0x4),%bx # Partition table (+4) + xorw %dx,%dx # Item number + +read_entry: + movb %ch,-0x4(%bx) # Zero active flag (ch == 0) + btw %dx,_FLAGS(%bp) # Entry enabled? + jnc next_entry # No + movb (%bx),%al # Load type + test %al, %al # skip empty partition + jz next_entry + movw $bootable_ids,%di # Lookup tables + movb $(TLEN+1),%cl # Number of entries + repne # Locate + scasb # type + addw $(TLEN-1), %di # Adjust + movb (%di),%cl # Partition + addw %cx,%di # description + callw putx # Display it + +next_entry: + incw %dx # Next item + addb $0x10,%bl # Next entry + jnc read_entry # Till done</programlisting> + </figure> + + <para>It is important to note that the active flag for each entry + is cleared, so after the scanning, <emphasis>no</emphasis> + partition entry is active in our memory copy of + <filename>boot0</filename>. Later, the active flag will be set + for the selected partition. This ensures that only one active + partition exists if the user chooses to write the changes back + to disk.</para> + + <para>The next block tests for other drives. At startup, + the <acronym>BIOS</acronym> writes the number of drives present + in the computer to address <literal>0x475</literal>. If there + are any other drives present, <filename>boot0</filename> prints + the current drive to screen. The user may command + <filename>boot0</filename> to scan partitions on another drive + later.</para> + + <figure xml:id="boot-boot0-test-drives"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting> popw %ax # Drive number + subb $0x79,%al # Does next + cmpb 0x475,%al # drive exist? (from BIOS?) + jb print_drive # Yes + decw %ax # Already drive 0? + jz print_prompt # Yes</programlisting> + </figure> + + <para>We make the assumption that a single drive is present, so + the jump to <literal>print_drive</literal> is not performed. We + also assume nothing strange happened, so we jump to + <literal>print_prompt</literal>.</para> + + <para>This next block just prints out a prompt followed by the + default option:</para> + + <figure xml:id="boot-boot0-prompt"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting>print_prompt: + movw $prompt,%si # Display + callw putstr # prompt + movb _OPT(%bp),%dl # Display + decw %si # default + callw putkey # key + jmp start_input # Skip beep</programlisting> + </figure> + + <para>Finally, a jump is performed to + <literal>start_input</literal>, where the + <acronym>BIOS</acronym> services are used to start a timer and + for reading user input from the keyboard; if the timer expires, + the default option will be selected:</para> + + <figure xml:id="boot-boot0-start-input"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting>start_input: + xorb %ah,%ah # BIOS: Get + int $0x1a # system time + movw %dx,%di # Ticks when + addw _TICKS(%bp),%di # timeout +read_key: + movb $0x1,%ah # BIOS: Check + int $0x16 # for keypress + jnz got_key # Have input + xorb %ah,%ah # BIOS: int 0x1a, 00 + int $0x1a # get system time + cmpw %di,%dx # Timeout? + jb read_key # No</programlisting> + </figure> + + <para>An interrupt is requested with number + <literal>0x1a</literal> and argument <literal>0</literal> in + register <literal>%ah</literal>. The <acronym>BIOS</acronym> + has a predefined set of services, requested by applications as + software-generated interrupts through the <literal>int</literal> + instruction and receiving arguments in registers (in this case, + <literal>%ah</literal>). Here, particularly, we are requesting + the number of clock ticks since last midnight; this value is + computed by the <acronym>BIOS</acronym> through the + <acronym>RTC</acronym> (Real Time Clock). This clock can be + programmed to work at frequencies ranging from 2 Hz to + 8192 Hz. The <acronym>BIOS</acronym> sets it to + 18.2 Hz at startup. When the request is satisfied, a + 32-bit result is returned by the <acronym>BIOS</acronym> in + registers <literal>%cx</literal> and <literal>%dx</literal> + (lower bytes in <literal>%dx</literal>). This result (the + <literal>%dx</literal> part) is copied to register + <literal>%di</literal>, and the value of the + <varname>TICKS</varname> variable is added to + <literal>%di</literal>. This variable resides in + <filename>boot0</filename> at offset <literal>_TICKS</literal> + (a negative value) from register <literal>%bp</literal> (which, + recall, points to <literal>0x800</literal>). The default value + of this variable is <literal>0xb6</literal> (182 in decimal). + Now, the idea is that <filename>boot0</filename> constantly + requests the time from the <acronym>BIOS</acronym>, and when the + value returned in register <literal>%dx</literal> is greater + than the value stored in <literal>%di</literal>, the time is up + and the default selection will be made. Since the RTC ticks + 18.2 times per second, this condition will be met after 10 + seconds (this default behaviour can be changed in the + <filename>Makefile</filename>). Until this time has passed, + <filename>boot0</filename> continually asks the + <acronym>BIOS</acronym> for any user input; this is done through + <literal>int 0x16</literal>, argument <literal>1</literal> in + <literal>%ah</literal>.</para> + + <para>Whether a key was pressed or the time expired, subsequent + code validates the selection. Based on the selection, the + register <literal>%si</literal> is set to point to the + appropriate partition entry in the partition table. This new + selection overrides the previous default one. Indeed, it + becomes the new default. Finally, the ACTIVE flag of the + selected partition is set. If it was enabled at compile time, + the in-memory version of <filename>boot0</filename> with these + modified values is written back to the <acronym>MBR</acronym> on + disk. We leave the details of this implementation to the + reader.</para> + + <para>We now end our study with the last code block from the + <filename>boot0</filename> program:</para> + + <figure xml:id="boot-boot0-check-bootable"> + <title><filename>sys/boot/i386/boot0/boot0.S</filename></title> + + <programlisting> movw $0x7c00,%bx # Address for read + movb $0x2,%ah # Read sector + callw intx13 # from disk + jc beep # If error + cmpw $0xaa55,0x1fe(%bx) # Bootable? + jne beep # No + pushw %si # Save ptr to selected part. + callw putn # Leave some space + popw %si # Restore, next stage uses it + jmp *%bx # Invoke bootstrap</programlisting> + </figure> + + <para>Recall that <literal>%si</literal> points to the selected + partition entry. This entry tells us where the partition begins + on disk. We assume, of course, that the partition selected is + actually a &os; slice.</para> + + <note> + <para>From now on, we will favor the use of the technically + more accurate term <quote>slice</quote> rather than + <quote>partition</quote>.</para> + </note> + + <para>The transfer buffer is set to <literal>0x7c00</literal> + (register <literal>%bx</literal>), and a read for the first + sector of the &os; slice is requested by calling + <literal>intx13</literal>. We assume that everything went okay, + so a jump to <literal>beep</literal> is not performed. In + particular, the new sector read must end with the magic sequence + <literal>0xaa55</literal>. Finally, the value at + <literal>%si</literal> (the pointer to the selected partition + table) is preserved for use by the next stage, and a jump is + performed to address <literal>0x7c00</literal>, where execution + of our next stage (the just-read block) is started.</para> + </sect1> + + <sect1 xml:id="boot-boot1"> + <title><literal>boot1</literal> Stage</title> + + <para>So far we have gone through the following sequence:</para> + + <itemizedlist> + <listitem> + <para>The <acronym>BIOS</acronym> did some early hardware + initialization, including the <acronym>POST</acronym>. The + <acronym>MBR</acronym> (<filename>boot0</filename>) was + loaded from absolute disk sector one to address + <literal>0x7c00</literal>. Execution control was passed to + that location.</para> + </listitem> + + <listitem> + <para><filename>boot0</filename> relocated itself to the + location it was linked to execute + (<literal>0x600</literal>), followed by a jump to continue + execution at the appropriate place. Finally, + <filename>boot0</filename> loaded the first disk sector from + the &os; slice to address <literal>0x7c00</literal>. + Execution control was passed to that location.</para> + </listitem> + </itemizedlist> + + <para><filename>boot1</filename> is the next step in the + boot-loading sequence. It is the first of three boot stages. + Note that we have been dealing exclusively + with disk sectors. Indeed, the <acronym>BIOS</acronym> loads + the absolute first sector, while <filename>boot0</filename> + loads the first sector of the &os; slice. Both loads are to + address <literal>0x7c00</literal>. We can conceptually think of + these disk sectors as containing the files + <filename>boot0</filename> and <filename>boot1</filename>, + respectively, but in reality this is not entirely true for + <filename>boot1</filename>. Strictly speaking, unlike + <filename>boot0</filename>, <filename>boot1</filename> is not + part of the boot blocks + <footnote> + <para>There is a file <filename>/boot/boot1</filename>, but it + is not the written to the beginning of the &os; slice. + Instead, it is concatenated with <filename>boot2</filename> + to form <filename>boot</filename>, which + <emphasis>is</emphasis> written to the beginning of the &os; + slice and read at boot time.</para></footnote>. + Instead, a single, full-blown file, <filename>boot</filename> + (<filename>/boot/boot</filename>), is what ultimately is + written to disk. This file is a combination of + <filename>boot1</filename>, <filename>boot2</filename> and the + <literal>Boot Extender</literal> (or <acronym>BTX</acronym>). + This single file is greater in size than a single sector + (greater than 512 bytes). Fortunately, + <filename>boot1</filename> occupies <emphasis>exactly</emphasis> + the first 512 bytes of this single file, so when + <filename>boot0</filename> loads the first sector of the &os; + slice (512 bytes), it is actually loading + <filename>boot1</filename> and transferring control to + it.</para> + + <para>The main task of <filename>boot1</filename> is to load the + next boot stage. This next stage is somewhat more complex. It + is composed of a server called the <quote>Boot Extender</quote>, + or <acronym>BTX</acronym>, and a client, called + <filename>boot2</filename>. As we will see, the last boot + stage, <filename>loader</filename>, is also a client of the + <acronym>BTX</acronym> server.</para> + + <para>Let us now look in detail at what exactly is done by + <filename>boot1</filename>, starting like we did for + <filename>boot0</filename>, at its entry point:</para> + + <figure xml:id="boot-boot1-entry"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>start: + jmp main</programlisting> + </figure> + + <para>The entry point at <literal>start</literal> simply jumps + past a special data area to the label <literal>main</literal>, + which in turn looks like this:</para> + + <figure xml:id="boot-boot1-main"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>main: + cld # String ops inc + xor %cx,%cx # Zero + mov %cx,%es # Address + mov %cx,%ds # data + mov %cx,%ss # Set up + mov $start,%sp # stack + mov %sp,%si # Source + mov $0x700,%di # Destination + incb %ch # Word count + rep # Copy + movsw # code</programlisting> + </figure> + + <para>Just like <filename>boot0</filename>, this + code relocates <filename>boot1</filename>, + this time to memory address <literal>0x700</literal>. However, + unlike <filename>boot0</filename>, it does not jump there. + <filename>boot1</filename> is linked to execute at + address <literal>0x7c00</literal>, effectively where it was + loaded in the first place. The reason for this relocation will + be discussed shortly.</para> + + <para>Next comes a loop that looks for the &os; slice. Although + <filename>boot0</filename> loaded <filename>boot1</filename> + from the &os; slice, no information was passed to it about this + <footnote> + <para>Actually we did pass a pointer to the slice entry in + register <literal>%si</literal>. However, + <filename>boot1</filename> does not assume that it was + loaded by <filename>boot0</filename> (perhaps some other + <acronym>MBR</acronym> loaded it, and did not pass this + information), so it assumes nothing.</para></footnote>, + so <filename>boot1</filename> must rescan the + partition table to find where the &os; slice starts. Therefore + it rereads the <acronym>MBR</acronym>:</para> + + <figure xml:id="boot-boot1-find-freebsd"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting> mov $part4,%si # Partition + cmpb $0x80,%dl # Hard drive? + jb main.4 # No + movb $0x1,%dh # Block count + callw nread # Read MBR</programlisting> + </figure> + + <para>In the code above, register <literal>%dl</literal> + maintains information about the boot device. This is passed on + by the <acronym>BIOS</acronym> and preserved by the + <acronym>MBR</acronym>. Numbers <literal>0x80</literal> and + greater tells us that we are dealing with a hard drive, so a + call is made to <literal>nread</literal>, where the + <acronym>MBR</acronym> is read. Arguments to + <literal>nread</literal> are passed through + <literal>%si</literal> and <literal>%dh</literal>. The memory + address at label <literal>part4</literal> is copied to + <literal>%si</literal>. This memory address holds a + <quote>fake partition</quote> to be used by + <literal>nread</literal>. The following is the data in the fake + partition:</para> + + <figure xml:id="boot-boot2-make-fake-partition"> + <title><filename>sys/boot/i386/boot2/Makefile</filename></title> + + <programlisting> part4: + .byte 0x80, 0x00, 0x01, 0x00 + .byte 0xa5, 0xfe, 0xff, 0xff + .byte 0x00, 0x00, 0x00, 0x00 + .byte 0x50, 0xc3, 0x00, 0x00</programlisting> + </figure> + + <para>In particular, the <acronym>LBA</acronym> for this fake + partition is hardcoded to zero. This is used as an argument to + the <acronym>BIOS</acronym> for reading absolute sector one from + the hard drive. Alternatively, CHS addressing could be used. + In this case, the fake partition holds cylinder 0, head 0 and + sector 1, which is equivalent to absolute sector one.</para> + + <para>Let us now proceed to take a look at + <literal>nread</literal>:</para> + + <figure xml:id="boot-boot1-nread"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>nread: + mov $0x8c00,%bx # Transfer buffer + mov 0x8(%si),%ax # Get + mov 0xa(%si),%cx # LBA + push %cs # Read from + callw xread.1 # disk + jnc return # If success, return</programlisting> + </figure> + + <para>Recall that <literal>%si</literal> points to the fake + partition. The word + <footnote> + <para>In the context of 16-bit real mode, a word is 2 + bytes.</para></footnote> + at offset <literal>0x8</literal> is copied to register + <literal>%ax</literal> and word at offset <literal>0xa</literal> + to <literal>%cx</literal>. They are interpreted by the + <acronym>BIOS</acronym> as the lower 4-byte value denoting the + LBA to be read (the upper four bytes are assumed to be zero). + Register <literal>%bx</literal> holds the memory address where + the <acronym>MBR</acronym> will be loaded. The instruction + pushing <literal>%cs</literal> onto the stack is very + interesting. In this context, it accomplishes nothing. However, as + we will see shortly, <filename>boot2</filename>, in conjunction + with the <acronym>BTX</acronym> server, also uses + <literal>xread.1</literal>. This mechanism will be discussed in + the next section.</para> + + <para>The code at <literal>xread.1</literal> further calls + the <literal>read</literal> function, which actually calls the + <acronym>BIOS</acronym> asking for the disk sector:</para> + + <figure xml:id="boot-boot1-xread1"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>xread.1: + pushl $0x0 # absolute + push %cx # block + push %ax # number + push %es # Address of + push %bx # transfer buffer + xor %ax,%ax # Number of + movb %dh,%al # blocks to + push %ax # transfer + push $0x10 # Size of packet + mov %sp,%bp # Packet pointer + callw read # Read from disk + lea 0x10(%bp),%sp # Clear stack + lret # To far caller</programlisting> + </figure> + + <para>Note the long return instruction at the end of this block. + This instruction pops out the <literal>%cs</literal> register + pushed by <literal>nread</literal>, and returns. Finally, + <literal>nread</literal> also returns.</para> + + <para>With the <acronym>MBR</acronym> loaded to memory, the actual + loop for searching the &os; slice begins:</para> + + <figure xml:id="boot-boot1-find-part"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting> mov $0x1,%cx # Two passes +main.1: + mov $0x8dbe,%si # Partition table + movb $0x1,%dh # Partition +main.2: + cmpb $0xa5,0x4(%si) # Our partition type? + jne main.3 # No + jcxz main.5 # If second pass + testb $0x80,(%si) # Active? + jnz main.5 # Yes +main.3: + add $0x10,%si # Next entry + incb %dh # Partition + cmpb $0x5,%dh # In table? + jb main.2 # Yes + dec %cx # Do two + jcxz main.1 # passes</programlisting> + </figure> + + <para>If a &os; slice is identified, execution continues at + <literal>main.5</literal>. Note that when a &os; slice is found + <literal>%si</literal> points to the appropriate entry in the + partition table, and <literal>%dh</literal> holds the partition + number. We assume that a &os; slice is found, so we continue + execution at <literal>main.5</literal>:</para> + + <figure xml:id="boot-boot1-main5"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>main.5: + mov %dx,0x900 # Save args + movb $0x10,%dh # Sector count + callw nread # Read disk + mov $0x9000,%bx # BTX + mov 0xa(%bx),%si # Get BTX length and set + add %bx,%si # %si to start of boot2.bin + mov $0xc000,%di # Client page 2 + mov $0xa200,%cx # Byte + sub %si,%cx # count + rep # Relocate + movsb # client</programlisting> + </figure> + + <para>Recall that at this point, register <literal>%si</literal> + points to the &os; slice entry in the <acronym>MBR</acronym> + partition table, so a call to <literal>nread</literal> will + effectively read sectors at the beginning of this partition. + The argument passed on register <literal>%dh</literal> tells + <literal>nread</literal> to read 16 disk sectors. Recall that + the first 512 bytes, or the first sector of the &os; slice, + coincides with the <filename>boot1</filename> program. Also + recall that the file written to the beginning of the &os; + slice is not <filename>/boot/boot1</filename>, but + <filename>/boot/boot</filename>. Let us look at the size of + these files in the filesystem:</para> + + <screen xml:id="boot-boot1-filesize">-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot0 +-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot1 +-r--r--r-- 1 root wheel 7.5K Jan 8 00:15 /boot/boot2 +-r--r--r-- 1 root wheel 8.0K Jan 8 00:15 /boot/boot</screen> + + <para>Both <filename>boot0</filename> and + <filename>boot1</filename> are 512 bytes each, so they fit + <emphasis>exactly</emphasis> in one disk sector. + <filename>boot2</filename> is much bigger, holding both + the <acronym>BTX</acronym> server and the <filename>boot2</filename> client. + Finally, a file called simply <filename>boot</filename> is 512 + bytes larger than <filename>boot2</filename>. This file is a + concatenation of <filename>boot1</filename> and + <filename>boot2</filename>. As already noted, + <filename>boot0</filename> is the file written to the absolute + first disk sector (the <acronym>MBR</acronym>), and + <filename>boot</filename> is the file written to the first + sector of the &os; slice; <filename>boot1</filename> and + <filename>boot2</filename> are <emphasis>not</emphasis> written + to disk. The command used to concatenate + <filename>boot1</filename> and <filename>boot2</filename> into a + single <filename>boot</filename> is merely + <command>cat boot1 boot2 > boot</command>.</para> + + <para>So <filename>boot1</filename> occupies exactly the first 512 + bytes of <filename>boot</filename> and, because + <filename>boot</filename> is written to the first sector of the + &os; slice, <filename>boot1</filename> fits exactly in this + first sector. Because <literal>nread</literal> reads the first + 16 sectors of the &os; slice, it effectively reads the entire + <filename>boot</filename> file + <footnote> + <para>512*16=8192 bytes, exactly the size of + <filename>boot</filename></para></footnote>. + We will see more details about how <filename>boot</filename> is + formed from <filename>boot1</filename> and + <filename>boot2</filename> in the next section.</para> + + <para>Recall that <literal>nread</literal> uses memory address + <literal>0x8c00</literal> as the transfer buffer to hold the + sectors read. This address is conveniently chosen. Indeed, + because <filename>boot1</filename> belongs to the first 512 + bytes, it ends up in the address range + <literal>0x8c00</literal>-<literal>0x8dff</literal>. The 512 + bytes that follows (range + <literal>0x8e00</literal>-<literal>0x8fff</literal>) is used to + store the <emphasis>bsdlabel</emphasis> + <footnote> + <para>Historically known as <quote>disklabel</quote>. If you + ever wondered where &os; stored this information, it is in + this region. See &man.bsdlabel.8;</para></footnote>.</para> + + <para>Starting at address <literal>0x9000</literal> is the + beginning of the <acronym>BTX</acronym> server, and immediately + following is the <filename>boot2</filename> client. The + <acronym>BTX</acronym> server acts as a kernel, and executes in + protected mode in the most privileged level. In contrast, the + <acronym>BTX</acronym> clients (<filename>boot2</filename>, for + example), execute in user mode. We will see how this is + accomplished in the next section. The code after the call to + <literal>nread</literal> locates the beginning of + <filename>boot2</filename> in the memory buffer, and copies it + to memory address <literal>0xc000</literal>. This is because + the <acronym>BTX</acronym> server arranges + <filename>boot2</filename> to execute in a segment starting at + <literal>0xa000</literal>. We explore this in detail in the + following section.</para> + + <para>The last code block of <filename>boot1</filename> enables + access to memory above 1MB + <footnote> + <para>This is necessary for legacy reasons. Interested + readers should see <link + xlink:href="http://en.wikipedia.org/wiki/A20_line"/>.</para></footnote> + and concludes with a jump to the starting point of the + <acronym>BTX</acronym> server:</para> + + <figure xml:id="boot-boot1-seta20"> + <title><filename>sys/boot/i386/boot2/boot1.S</filename></title> + + <programlisting>seta20: + cli # Disable interrupts +seta20.1: + dec %cx # Timeout? + jz seta20.3 # Yes + + inb $0x64,%al # Get status + testb $0x2,%al # Busy? + jnz seta20.1 # Yes + movb $0xd1,%al # Command: Write + outb %al,$0x64 # output port +seta20.2: + inb $0x64,%al # Get status + testb $0x2,%al # Busy? + jnz seta20.2 # Yes + movb $0xdf,%al # Enable + outb %al,$0x60 # A20 +seta20.3: + sti # Enable interrupts + jmp 0x9010 # Start BTX</programlisting> + </figure> + + <para>Note that right before the jump, interrupts are + enabled.</para> + </sect1> + + <sect1 xml:id="btx-server"> + <title>The <acronym>BTX</acronym> Server</title> + + <para>Next in our boot sequence is the + <acronym>BTX</acronym> Server. Let us quickly remember how we + got here:</para> <itemizedlist> <listitem> - <para>modifies the bootable flag for the selected partition to - make it bootable, and clears the previous</para> + <para>The <acronym>BIOS</acronym> loads the absolute sector + one (the <acronym>MBR</acronym>, or + <filename>boot0</filename>), to address + <literal>0x7c00</literal> and jumps there.</para> </listitem> <listitem> - <para>saves itself to disk to remember what partition (slice) - has been selected so to use it as the default on the next - boot</para> + <para><filename>boot0</filename> relocates itself to + <literal>0x600</literal>, the address it was linked to + execute, and jumps over there. It then reads the first + sector of the &os; slice (which consists of + <filename>boot1</filename>) into address + <literal>0x7c00</literal> and jumps over there.</para> </listitem> <listitem> - <para>loads the first sector of the selected partition (slice) - into memory and jumps there</para> + <para><filename>boot1</filename> loads the first 16 sectors + of the &os; slice into address <literal>0x8c00</literal>. + This 16 sectors, or 8192 bytes, is the whole file + <filename>boot</filename>. The file is a + concatenation of <filename>boot1</filename> and + <filename>boot2</filename>. <filename>boot2</filename>, in + turn, contains the <acronym>BTX</acronym> server and the + <filename>boot2</filename> client. Finally, a jump is made + to address <literal>0x9010</literal>, the entry point of the + <acronym>BTX</acronym> server.</para> </listitem> </itemizedlist> - <para>What kind of data should reside on the very first sector of - a bootable partition (slice), in our case, a FreeBSD slice? As - you may have already guessed, it is - <filename>boot2</filename>.</para> - </sect1> - - <sect1 xml:id="boot-boot2"> - <title><literal>boot2</literal> Stage</title> - - <para>You might wonder, why <literal>boot2</literal> comes after - <literal>boot0</literal>, and not boot1. Actually, there is a - 512-byte file called <filename>boot1</filename> in the directory - <filename>/boot</filename> as well. It is used for booting from - a floppy. When booting from a floppy, - <filename>boot1</filename> plays the same role as - <filename>boot0</filename> for a harddisk: it locates - <filename>boot2</filename> and runs it.</para> - - <para>You may have realized that a file - <filename>/boot/mbr</filename> exists as well. It is a - simplified version of <filename>boot0</filename>. The code in - <filename>mbr</filename> does not provide a menu for the user, - it just blindly boots the partition marked active.</para> - - <para>The code implementing <filename>boot2</filename> resides in - <filename>sys/boot/i386/boot2/</filename>, and the executable - itself is in <filename>/boot</filename>. The files - <filename>boot0</filename> and <filename>boot2</filename> that - are in <filename>/boot</filename> are not used by the bootstrap, - but by utilities such as <application>boot0cfg</application>. - The actual position for <filename>boot0</filename> is in the - MBR. For <filename>boot2</filename> it is the beginning of a - bootable FreeBSD slice. These locations are not under the - filesystem's control, so they are invisible to commands like - <application>ls</application>.</para> - - <para>The main task for <literal>boot2</literal> is to load the - file <filename>/boot/loader</filename>, which is the third stage - in the bootstrapping procedure. The code in - <literal>boot2</literal> cannot use any services like - <function>open()</function> and <function>read()</function>, - since the kernel is not yet loaded. It must scan the harddisk, - knowing about the filesystem structure, find the file - <filename>/boot/loader</filename>, read it into memory using a - BIOS service, and then pass the execution to the loader's entry - point.</para> - - <para>Besides that, <literal>boot2</literal> prompts for user - input so the loader can be booted from different disk, unit, - slice and partition.</para> - - <para>The <literal>boot2</literal> binary is created in special - way:</para> - - <programlisting><filename>sys/boot/i386/boot2/Makefile:</filename> -boot2.ld: boot2.ldr boot2.bin ${BTXKERN} - btxld -v -E ${ORG2} -f bin -b ${BTXKERN} -l boot2.ldr \ - -o ${.TARGET} -P 1 boot2.bin</programlisting> - - <indexterm><primary>BTX</primary></indexterm> - <para>This Makefile snippet shows that &man.btxld.8; is used to - link the binary. BTX, which stands for BooT eXtender, is a - piece of code that provides a protected mode environment for the - program, called the client, that it is linked with. So - <literal>boot2</literal> is a BTX client, i.e., it uses the - service provided by BTX.</para> - - <indexterm><primary>linker</primary></indexterm> - <para>The <application>btxld</application> utility is the linker. - It links two binaries together. The difference between - &man.btxld.8; and &man.ld.1; is that - <application>ld</application> usually links object files into a - shared object or executable, while - <application>btxld</application> links an object file with the - BTX, producing the binary file suitable to be put on the - beginning of the partition for the system boot.</para> - - <para><literal>boot0</literal> passes the execution to BTX's entry - point. BTX then switches the processor to protected mode, and - prepares a simple environment before calling the client. This - includes:</para> + <para>Before studying the <acronym>BTX</acronym> Server in detail, + let us further review how the single, all-in-one + <filename>boot</filename> file is created. The way + <filename>boot</filename> is built is defined in its + <filename>Makefile</filename> + (<filename>/usr/src/sys/boot/i386/boot2/Makefile</filename>). + Let us look at the rule that creates the + <filename>boot</filename> file:</para> + + <figure xml:id="boot-boot1-make-boot"> + <title><filename>sys/boot/i386/boot2/Makefile</filename></title> + + <programlisting> boot: boot1 boot2 + cat boot1 boot2 > boot</programlisting> + </figure> + + <para>This tells us that <filename>boot1</filename> and + <filename>boot2</filename> are needed, and the rule simply + concatenates them to produce a single file called + <filename>boot</filename>. The rules for creating + <filename>boot1</filename> are also quite simple:</para> + + <figure xml:id="boot-boot1-make-boot1"> + <title><filename>sys/boot/i386/boot2/Makefile</filename></title> + + <programlisting> boot1: boot1.out + objcopy -S -O binary boot1.out boot1 + + boot1.out: boot1.o + ld -e start -Ttext 0x7c00 -o boot1.out boot1.o</programlisting> + </figure> + + <para>To apply the rule for creating + <filename>boot1</filename>, <filename>boot1.out</filename> must + be resolved. This, in turn, depends on the existence of + <filename>boot1.o</filename>. This last file is simply the + result of assembling our familiar <filename>boot1.S</filename>, + without linking. Now, the rule for creating + <filename>boot1.out</filename> is applied. This tells us that + <filename>boot1.o</filename> should be linked with + <literal>start</literal> as its entry point, and starting at + address <literal>0x7c00</literal>. Finally, + <filename>boot1</filename> is created from + <filename>boot1.out</filename> applying the appropriate rule. + This rule is the <filename>objcopy</filename> command applied to + <filename>boot1.out</filename>. Note the flags passed to + <filename>objcopy</filename>: <literal>-S</literal> tells it to + strip all relocation and symbolic information; + <literal>-O binary</literal> indicates the output format, that + is, a simple, unformatted binary file.</para> + + <para>Having <filename>boot1</filename>, let us take a look at how + <filename>boot2</filename> is constructed:</para> + + <figure xml:id="boot-boot1-make-boot2"> + <title><filename>sys/boot/i386/boot2/Makefile</filename></title> + + <programlisting> boot2: boot2.ld + @set -- `ls -l boot2.ld`; x=$$((7680-$$5)); \ + echo "$$x bytes available"; test $$x -ge 0 + dd if=boot2.ld of=boot2 obs=7680 conv=osync + + boot2.ld: boot2.ldr boot2.bin ../btx/btx/btx + btxld -v -E 0x2000 -f bin -b ../btx/btx/btx -l boot2.ldr \ + -o boot2.ld -P 1 boot2.bin + + boot2.ldr: + dd if=/dev/zero of=boot2.ldr bs=512 count=1 + + boot2.bin: boot2.out + objcopy -S -O binary boot2.out boot2.bin + + boot2.out: ../btx/lib/crt0.o boot2.o sio.o + ld -Ttext 0x2000 -o boot2.out + + boot2.o: boot2.s + ${CC} ${ACFLAGS} -c boot2.s + + boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c + ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c + sed -e '/align/d' -e '/nop/d' "MISSING" boot2.s.tmp > boot2.s + rm -f boot2.s.tmp + + boot2.h: boot1.out + ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \ + { x = $$1 - ORG1; \ + printf("#define XREADORG %#x\n", REL1 + x) }' \ + ORG1=`printf "%d" ${ORG1}` \ + REL1=`printf "%d" ${REL1}` > ${.TARGET}</programlisting> + </figure> + + <para>The mechanism for building <filename>boot2</filename> is + far more elaborate. Let us point out the most relevant facts. + The dependency list is as follows:</para> + + <figure xml:id="boot-boot1-make-boot2-more"> + <title><filename>sys/boot/i386/boot2/Makefile</filename></title> + + <programlisting> boot2: boot2.ld + boot2.ld: boot2.ldr boot2.bin ${BTXDIR}/btx/btx + boot2.bin: boot2.out + boot2.out: ${BTXDIR}/lib/crt0.o boot2.o sio.o + boot2.o: boot2.s + boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c + boot2.h: boot1.out</programlisting> + </figure> + + <para>Note that initially there is no header file + <filename>boot2.h</filename>, but its creation depends on + <filename>boot1.out</filename>, which we already have. The rule + for its creation is a bit terse, but the important thing is that + the output, <filename>boot2.h</filename>, is something like + this:</para> + + <figure xml:id="boot-boot1-make-boot2h"> + <title><filename>sys/boot/i386/boot2/boot2.h</filename></title> + + <programlisting> + #define XREADORG 0x725</programlisting> + </figure> + + <para>Recall that <filename>boot1</filename> was relocated (i.e., + copied from <literal>0x7c00</literal> to + <literal>0x700</literal>). This relocation will now make sense, + because as we will see, the <acronym>BTX</acronym> server + reclaims some memory, including the space where + <filename>boot1</filename> was originally loaded. However, the + <acronym>BTX</acronym> server needs access to + <filename>boot1</filename>'s <literal>xread</literal> function; + this function, according to the output of + <filename>boot2.h</filename>, is at location + <literal>0x725</literal>. Indeed, the + <acronym>BTX</acronym> server uses the + <literal>xread</literal> function from + <filename>boot1</filename>'s relocated code. This function is + now accesible from within the <filename>boot2</filename> + client.</para> + + <para>We next build <filename>boot2.s</filename> from files + <filename>boot2.h</filename>, <filename>boot2.c</filename> and + <filename>/usr/src/sys/boot/common/ufsread.c</filename>. The + rule for this is to compile the code in + <filename>boot2.c</filename> (which includes + <filename>boot2.h</filename> and <filename>ufsread.c</filename>) + into assembly code. Having <filename>boot2.s</filename>, the + next rule assembles <filename>boot2.s</filename>, creating the + object file <filename>boot2.o</filename>. The + next rule directs the linker to link various files + (<filename>crt0.o</filename>, + <filename>boot2.o</filename> and <filename>sio.o</filename>). + Note that the output file, <filename>boot2.out</filename>, is + linked to execute at address <literal>0x2000</literal>. Recall + that <filename>boot2</filename> will be executed in user mode, + within a special user segment set up by the + <acronym>BTX</acronym> server. This segment starts at + <literal>0xa000</literal>. Also, remember that the + <filename>boot2</filename> portion of <filename>boot</filename> + was copied to address <literal>0xc000</literal>, that is, offset + <literal>0x2000</literal> from the start of the user segment, so + <filename>boot2</filename> will work properly when we transfer + control to it. Next, <filename>boot2.bin</filename> is created + from <filename>boot2.out</filename> by stripping its symbols and + format information; boot2.bin is a <emphasis>raw</emphasis> + binary. Now, note that a file <filename>boot2.ldr</filename> is + created as a 512-byte file full of zeros. This space is + reserved for the bsdlabel.</para> + + <para>Now that we have files <filename>boot1</filename>, + <filename>boot2.bin</filename> and + <filename>boot2.ldr</filename>, only the + <acronym>BTX</acronym> server is missing before creating the + all-in-one <filename>boot</filename> file. The + <acronym>BTX</acronym> server is located in + <filename>/usr/src/sys/boot/i386/btx/btx</filename>; it has its + own <filename>Makefile</filename> with its own set of rules for + building. The important thing to notice is that it is also + compiled as a <emphasis>raw</emphasis> binary, and that it is + linked to execute at address <literal>0x9000</literal>. The + details can be found in + <filename>/usr/src/sys/boot/i386/btx/btx/Makefile</filename>.</para> + + <para>Having the files that comprise the <filename>boot</filename> + program, the final step is to <emphasis>merge</emphasis> them. + This is done by a special program called + <filename>btxld</filename> (source located in + <filename>/usr/src/usr.sbin/btxld</filename>). Some arguments + to this program include the name of the output file + (<filename>boot</filename>), its entry point + (<literal>0x2000</literal>) and its file format + (raw binary). The various files are + finally merged by this utility into the file + <filename>boot</filename>, which consists of + <filename>boot1</filename>, <filename>boot2</filename>, the + <literal>bsdlabel</literal> and the + <acronym>BTX</acronym> server. This file, which takes + exactly 16 sectors, or 8192 bytes, is what is + actually written to the beginning of the &os; slice + during instalation. Let us now proceed to study the + <acronym>BTX</acronym> server program.</para> + + <para>The <acronym>BTX</acronym> server prepares a simple + environment and switches from 16-bit real mode to 32-bit + protected mode, right before passing control to the client. + This includes initializing and updating the following data + structures:</para> <indexterm><primary>virtual v86 mode</primary></indexterm> <itemizedlist> <listitem> - <para>virtual v86 mode. That means, the BTX is a v86 monitor. - Real mode instructions like pushf, popf, cli, sti, if called - by the client, will work.</para> + <para>Modifies the + <literal>Interrupt Vector Table (IVT)</literal>. The + <acronym>IVT</acronym> provides exception and interrupt + handlers for Real-Mode code.</para> + </listitem> + + <listitem> + <para>The <literal>Interrupt Descriptor Table (IDT)</literal> + is created. Entries are provided for processor exceptions, + hardware interrupts, two system calls and V86 interface. + The IDT provides exception and interrupt handlers for + Protected-Mode code.</para> </listitem> <listitem> - <para>Interrupt Descriptor Table (IDT) is set up so all - hardware interrupts are routed to the default BIOS's - handlers, and interrupt 0x30 is set up to be the syscall - gate.</para> + <para>A <literal>Task-State Segment (TSS)</literal> is + created. This is necessary because the processor works in + the <emphasis>least</emphasis> privileged level when + executing the client (<filename>boot2</filename>), but in + the <emphasis>most</emphasis> privileged level when + executing the <acronym>BTX</acronym> server.</para> </listitem> <listitem> - <para>Two system calls: <function>exec</function> and - <function>exit</function>, are defined:</para> - - <programlisting><filename>sys/boot/i386/btx/lib/btxsys.s:</filename> - .set INT_SYS,0x30 # Interrupt number -# -# System call: exit -# -__exit: xorl %eax,%eax # BTX system - int $INT_SYS # call 0x0 -# -# System call: exec -# -__exec: movl $0x1,%eax # BTX system - int $INT_SYS # call 0x1</programlisting> + <para>The <acronym>GDT</acronym> (Global Descriptor Table) is + set up. Entries (descriptors) are provided for + supervisor code and data, user code and data, and real-mode + code and data. + <footnote> + <para>Real-mode code and data are necessary when switching + back to real mode from protected mode, as suggested by + the Intel manuals.</para></footnote></para> </listitem> </itemizedlist> - <para>BTX creates a Global Descriptor Table (GDT):</para> - - <programlisting><filename>sys/boot/i386/btx/btx/btx.s:</filename> -gdt: .word 0x0,0x0,0x0,0x0 # Null entry - .word 0xffff,0x0,0x9a00,0xcf # SEL_SCODE - .word 0xffff,0x0,0x9200,0xcf # SEL_SDATA - .word 0xffff,0x0,0x9a00,0x0 # SEL_RCODE - .word 0xffff,0x0,0x9200,0x0 # SEL_RDATA - .word 0xffff,MEM_USR,0xfa00,0xcf# SEL_UCODE - .word 0xffff,MEM_USR,0xf200,0xcf# SEL_UDATA - .word _TSSLM,MEM_TSS,0x8900,0x0 # SEL_TSS</programlisting> - - <para>The client's code and data start from address MEM_USR - (0xa000), and a selector (SEL_UCODE) points to the client's code - segment. The SEL_UCODE descriptor has Descriptor Privilege - Level (DPL) 3, which is the lowest privilege level. But the - <literal>INT 0x30</literal> instruction handler resides in a - segment pointed to by the SEL_SCODE (supervisor code) selector, - as shown from the code that creates an IDT:</para> - - <programlisting> mov $SEL_SCODE,%dh # Segment selector + <para>Let us now start studying the actual implementation. Recall + that <filename>boot1</filename> made a jump to address + <literal>0x9010</literal>, the <acronym>BTX</acronym> server's + entry point. Before studying program execution there, + note that the <acronym>BTX</acronym> server has a special header + at address range <literal>0x9000-0x900f</literal>, right before + its entry point. This header is defined as follows:</para> + + <figure xml:id="btx-header"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>start: # Start of code +/* + * BTX header. + */ +btx_hdr: .byte 0xeb # Machine ID + .byte 0xe # Header size + .ascii "BTX" # Magic + .byte 0x1 # Major version + .byte 0x2 # Minor version + .byte BTX_FLAGS # Flags + .word PAG_CNT-MEM_ORG>>0xc # Paging control + .word break-start # Text size + .long 0x0 # Entry address</programlisting> + </figure> + + <para>Note the first two bytes are <literal>0xeb</literal> and + <literal>0xe</literal>. In the IA-32 architecture, these two + bytes are interpreted as a relative jump past the header into + the entry point, so in theory, <filename>boot1</filename> could + jump here (address <literal>0x9000</literal>) instead of address + <literal>0x9010</literal>. Note that the last field in the + <acronym>BTX</acronym> header is a pointer to the client's + (<filename>boot2</filename>) entry point. This field is patched + at link time.</para> + + <para>Immediately following the header is the + <acronym>BTX</acronym> server's entry point:</para> + + <figure xml:id="btx-init"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Initialization routine. + */ +init: cli # Disable interrupts + xor %ax,%ax # Zero/segment + mov %ax,%ss # Set up + mov $0x1800,%sp # stack + mov %ax,%es # Address + mov %ax,%ds # data + pushl $0x2 # Clear + popfl # flags</programlisting> + </figure> + + <para>This code disables interrupts, sets up a working stack + (starting at address <literal>0x1800</literal>) and clears the + flags in the EFLAGS register. Note that the + <literal>popfl</literal> instruction pops out a doubleword (4 + bytes) from the stack and places it in the EFLAGS register. + Because the value actually popped is <literal>2</literal>, the + EFLAGS register is effectively cleared (IA-32 requires that bit + 2 of the EFLAGS register always be 1).</para> + + <para>Our next code block clears (sets to <literal>0</literal>) + the memory range <literal>0x5e00-0x8fff</literal>. This range + is where the various data structures will be created:</para> + + <figure xml:id="btx-clear-mem"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Initialize memory. + */ + mov $0x5e00,%di # Memory to initialize + mov $(0x9000-0x5e00)/2,%cx # Words to zero + rep # Zero-fill + stosw # memory</programlisting> + </figure> + + <para>Recall that <filename>boot1</filename> was originally loaded + to address <literal>0x7c00</literal>, so, with this memory + initialization, that copy effectively dissapeared. However, + also recall that <filename>boot1</filename> was relocated to + <literal>0x700</literal>, so <emphasis>that</emphasis> copy is + still in memory, and the <acronym>BTX</acronym> server will make + use of it.</para> + + <para>Next, the real-mode <acronym>IVT</acronym> (Interrupt Vector + Table is updated. The <acronym>IVT</acronym> is an array of + segment/offset pairs for exception and interrupt handlers. The + <acronym>BIOS</acronym> normally maps hardware interrupts to + interrupt vectors <literal>0x8</literal> to + <literal>0xf</literal> and <literal>0x70</literal> to + <literal>0x77</literal> but, as will be seen, the 8259A + Programmable Interrupt Controller, the chip controlling the + actual mapping of hardware interrupts to interrupt vectors, is + programmed to remap these interrupt vectors from + <literal>0x8-0xf</literal> to <literal>0x20-0x27</literal> and + from <literal>0x70-0x77</literal> to + <literal>0x28-0x2f</literal>. Thus, interrupt handlers are + provided for interrupt vectors <literal>0x20-0x2f</literal>. + The reason the <acronym>BIOS</acronym>-provided handlers are not + used directly is because they work in 16-bit real mode, but not + 32-bit protected mode. Processor mode will be switched to + 32-bit protected mode shortly. However, the + <acronym>BTX</acronym> server sets up a mechanism to effectively + use the handlers provided by the <acronym>BIOS</acronym>:</para> + + <figure xml:id="btx-ivt"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Update real mode IDT for reflecting hardware interrupts. + */ + mov $intr20,%bx # Address first handler + mov $0x10,%cx # Number of handlers + mov $0x20*4,%di # First real mode IDT entry +init.0: mov %bx,(%di) # Store IP + inc %di # Address next + inc %di # entry + stosw # Store CS + add $4,%bx # Next handler + loop init.0 # Next IRQ</programlisting> + </figure> + + <para>The next block creates the <acronym>IDT</acronym> (Interrupt + Descriptor Table). The <acronym>IDT</acronym> is analogous, in + protected mode, to the <acronym>IVT</acronym> in real mode. + That is, the <acronym>IDT</acronym> describes the various + exception and interrupt handlers used when the processor is + executing in protected mode. In essence, it also consists of an + array of segment/offset pairs, although the structure is + somewhat more complex, because segments in protected mode are + different than in real mode, and various protection mechanisms + apply:</para> + + <figure xml:id="btx-idt"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Create IDT. + */ + mov $0x5e00,%di # IDT's address + mov $idtctl,%si # Control string +init.1: lodsb # Get entry + cbw # count + xchg %ax,%cx # as word + jcxz init.4 # If done + lodsb # Get segment + xchg %ax,%dx # P:DPL:type + lodsw # Get control + xchg %ax,%bx # set + lodsw # Get handler offset + mov $SEL_SCODE,%dh # Segment selector init.2: shr %bx # Handle this int? jnc init.3 # No mov %ax,(%di) # Set handler offset mov %dh,0x2(%di) # and selector mov %dl,0x5(%di) # Set P:DPL:type - add $0x4,%ax # Next handler</programlisting> + add $0x4,%ax # Next handler +init.3: lea 0x8(%di),%di # Next entry + loop init.2 # Till set done + jmp init.1 # Continue</programlisting> + </figure> + + <para>Each entry in the <literal>IDT</literal> is 8 bytes long. + Besides the segment/offset information, they also describe the + segment type, privilege level, and whether the segment is + present in memory or not. The construction is such that + interrupt vectors from <literal>0</literal> to + <literal>0xf</literal> (exceptions) are handled by function + <literal>intx00</literal>; vector <literal>0x10</literal> (also + an exception) is handled by <literal>intx10</literal>; hardware + interrupts, which are later configured to start at interrupt + vector <literal>0x20</literal> all the way to interrupt vector + <literal>0x2f</literal>, are handled by function + <literal>intx20</literal>. Lastly, interrupt vector + <literal>0x30</literal>, which is used for system calls, is + handled by <literal>intx30</literal>, and vectors + <literal>0x31</literal> and <literal>0x32</literal> are handled + by <literal>intx31</literal>. It must be noted that only + descriptors for interrupt vectors <literal>0x30</literal>, + <literal>0x31</literal> and <literal>0x32</literal> are given + privilege level 3, the same privilege level as the + <filename>boot2</filename> client, which means the client can + execute a software-generated interrupt to this vectors through + the <literal>int</literal> instruction without failing (this is + the way <filename>boot2</filename> use the services provided by + the <acronym>BTX</acronym> server). Also, note that + <emphasis>only</emphasis> software-generated interrupts are + protected from code executing in lesser privilege levels. + Hardware-generated interrupts and processor-generated exceptions + are <emphasis>always</emphasis> handled adequately, regardless + of the actual privileges involved.</para> + + <para>The next step is to initialize the <acronym>TSS</acronym> + (Task-State Segment). The <acronym>TSS</acronym> is a hardware + feature that helps the operating system or executive software + implement multitasking functionality through process + abstraction. The IA-32 architecture demands the creation and + use of <emphasis>at least</emphasis> one <acronym>TSS</acronym> + if multitasking facilities are used or different privilege + levels are defined. Because the <filename>boot2</filename> + client is executed in privilege level 3, but the + <acronym>BTX</acronym> server does in privilege level 0, a + <acronym>TSS</acronym> must be defined:</para> + + <figure xml:id="btx-tss"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Initialize TSS. + */ +init.4: movb $_ESP0H,TSS_ESP0+1(%di) # Set ESP0 + movb $SEL_SDATA,TSS_SS0(%di) # Set SS0 + movb $_TSSIO,TSS_MAP(%di) # Set I/O bit map base</programlisting> + </figure> + + <para>Note that a value is given for the Privilege Level 0 stack + pointer and stack segment in the <acronym>TSS</acronym>. This is needed because, + if an interrupt or exception is received while executing + <filename>boot2</filename> in Privilege Level 3, a change to + Privilege Level 0 is automatically performed by the processor, + so a new working stack is needed. Finally, the I/O Map Base + Address field of the <acronym>TSS</acronym> is given a value, which is a 16-bit + offset from the beginning of the <acronym>TSS</acronym> to the I/O Permission + Bitmap and the Interrupt Redirection Bitmap.</para> + + <para>After the <acronym>IDT</acronym> and <acronym>TSS</acronym> are created, the processor is ready to + switch to protected mode. This is done in the next + block:</para> + + <figure xml:id="btx-prot"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Bring up the system. + */ + mov $0x2820,%bx # Set protected mode + callw setpic # IRQ offsets + lidt idtdesc # Set IDT + lgdt gdtdesc # Set GDT + mov %cr0,%eax # Switch to protected + inc %ax # mode + mov %eax,%cr0 # + ljmp $SEL_SCODE,$init.8 # To 32-bit code + .code32 +init.8: xorl %ecx,%ecx # Zero + movb $SEL_SDATA,%cl # To 32-bit + movw %cx,%ss # stack</programlisting> + </figure> + + <para>First, a call is made to <literal>setpic</literal> to + program the 8259A <acronym>PIC</acronym> (Programmable Interrupt Controller). + This chip is connected to multiple hardware interrupt sources. + Upon receiving an interrupt from a device, it + signals the processor with the appropriate interrupt vector. + This can be customized so that specific interrupts are + associated with specific interrupt vectors, as explained before. + Next, the <acronym>IDTR</acronym> (Interrupt Descriptor Table Register) and + <acronym>GDTR</acronym> (Global Descriptor Table Register) are loaded with the + instructions <literal>lidt</literal> and <literal>lgdt</literal>, respectively. These registers are + loaded with the base address and limit address for the <acronym>IDT</acronym> and + <acronym>GDT</acronym>. The following three instructions set the Protection Enable + (PE) bit of the <literal>%cr0</literal> register. This + effectively switches the processor to + 32-bit protected mode. Next, a long jump is made to + <literal>init.8</literal> using segment selector SEL_SCODE, + which selects the Supervisor Code Segment. The processor is + effectively executing in CPL 0, the most privileged level, after + this jump. Finally, the Supervisor Data Segment is selected for + the stack by assigning the segment selector SEL_SDATA to the + <literal>%ss</literal> register. This data segment also has a + privilege level of <literal>0</literal>.</para> + + <para>Our last code block is responsible for loading the + <acronym>TR</acronym> (Task Register) with the segment selector for the <acronym>TSS</acronym> we created + earlier, and setting the User Mode environment before passing + execution control to the <filename>boot2</filename> + client.</para> + + <figure xml:id="btx-end"> + <title><filename>sys/boot/i386/btx/btx/btx.S</filename></title> + + <programlisting>/* + * Launch user task. + */ + movb $SEL_TSS,%cl # Set task + ltr %cx # register + movl $0xa000,%edx # User base address + movzwl %ss:BDA_MEM,%eax # Get free memory + shll $0xa,%eax # To bytes + subl $ARGSPACE,%eax # Less arg space + subl %edx,%eax # Less base + movb $SEL_UDATA,%cl # User data selector + pushl %ecx # Set SS + pushl %eax # Set ESP + push $0x202 # Set flags (IF set) + push $SEL_UCODE # Set CS + pushl btx_hdr+0xc # Set EIP + pushl %ecx # Set GS + pushl %ecx # Set FS + pushl %ecx # Set DS + pushl %ecx # Set ES + pushl %edx # Set EAX + movb $0x7,%cl # Set remaining +init.9: push $0x0 # general + loop init.9 # registers + popa # and initialize + popl %es # Initialize + popl %ds # user + popl %fs # segment + popl %gs # registers + iret # To user mode</programlisting> + </figure> + + <para>Note that the client's environment include a stack segment + selector and stack pointer (registers <literal>%ss</literal> and + <literal>%esp</literal>). Indeed, once the <acronym>TR</acronym> is loaded with + the appropriate stack segment selector (instruction + <literal>ltr</literal>), the stack pointer is calculated and + pushed onto the stack along with the stack's segment selector. + Next, the value <literal>0x202</literal> is pushed onto the + stack; it is the value that the EFLAGS will get when control is + passed to the client. Also, the User Mode code segment selector + and the client's entry point are pushed. Recall that this entry + point is patched in the <acronym>BTX</acronym> header at link time. Finally, + segment selectors (stored in register <literal>%ecx</literal>) + for the segment registers + <literal>%gs, %fs, %ds and %es</literal> are pushed onto the + stack, along with the value at <literal>%edx</literal> + (<literal>0xa000</literal>). Keep in mind the various values + that have been pushed onto the stack (they will be popped out + shortly). Next, values for the remaining general purpose + registers are also pushed onto the stack (note the + <literal>loop</literal> that pushes the value + <literal>0</literal> seven times). Now, values will be started + to be popped out of the stack. First, the + <literal>popa</literal> instruction pops out of the stack the + latest seven values pushed. They are stored in the general + purpose registers in order + <literal>%edi, %esi, %ebp, %ebx, %edx, %ecx, %eax</literal>. + Then, the various segment selectors pushed are popped into the + various segment registers. Five values still remain on the + stack. They are popped when the <literal>iret</literal> + instruction is executed. This instruction first pops + the value that was pushed from the <acronym>BTX</acronym> header. This value is a + pointer to <filename>boot2</filename>'s entry point. It is + placed in the register <literal>%eip</literal>, the instruction + pointer register. Next, the segment selector for the User + Code Segment is popped and copied to register + <literal>%cs</literal>. Remember that + this segment's privilege level is 3, the least privileged + level. This means that we must provide values for the stack of + this privilege level. This is why the processor, besides + further popping the value for the EFLAGS register, does two more + pops out of the stack. These values go to the stack + pointer (<literal>%esp</literal>) and the stack segment + (<literal>%ss</literal>). Now, execution continues at + <literal>boot0</literal>'s entry point.</para> + + <para>It is important to note how the User Code Segment is + defined. This segment's <emphasis>base address</emphasis> is + set to <literal>0xa000</literal>. This means that code memory + addresses are <emphasis>relative</emphasis> to address 0xa000; + if code being executed is fetched from address + <literal>0x2000</literal>, the <emphasis>actual</emphasis> + memory addressed is + <literal>0xa000+0x2000=0xc000</literal>.</para> + </sect1> - <para>So, when the client calls <function>__exec()</function>, the - code will be executed with the highest privileges. This allows - the kernel to change the protected mode data structures, such as - page tables, GDT, IDT, etc later, if needed.</para> + <sect1 xml:id="boot2"> + <title><application>boot2</application> Stage</title> <para><literal>boot2</literal> defines an important structure, <literal>struct bootinfo</literal>. This structure is @@ -416,7 +1760,7 @@ init.2: shr %bx # Handle this int? loader, and then further to the kernel. Some nodes of this structures are set by <literal>boot2</literal>, the rest by the loader. This structure, among other information, contains the - kernel filename, BIOS harddisk geometry, BIOS drive number for + kernel filename, <acronym>BIOS</acronym> harddisk geometry, <acronym>BIOS</acronym> drive number for boot device, physical memory available, <literal>envp</literal> pointer etc. The definition for it is:</para> @@ -451,8 +1795,8 @@ struct bootinfo { <function>ino_t lookup(char *filename)</function> and <function>int xfsread(ino_t inode, void *buf, size_t nbyte)</function> are used to read the content of a file into - memory. <filename>/boot/loader</filename> is an ELF binary, but - where the ELF header is prepended with a.out's <literal>struct + memory. <filename>/boot/loader</filename> is an <acronym>ELF</acronym> binary, but + where the <acronym>ELF</acronym> header is prepended with <filename>a.out</filename>'s <literal>struct exec</literal> structure. <function>load()</function> scans the loader's ELF header, loading the content of <filename>/boot/loader</filename> into memory, and passing the @@ -467,10 +1811,10 @@ struct bootinfo { <sect1 xml:id="boot-loader"> <title><application>loader</application> Stage</title> - <para><application>loader</application> is a BTX client as well. + <para><application>loader</application> is a <acronym>BTX</acronym> client as well. I will not describe it here in detail, there is a comprehensive manpage written by Mike Smith, &man.loader.8;. The underlying - mechanisms and BTX were discussed above.</para> + mechanisms and <acronym>BTX</acronym> were discussed above.</para> <para>The main task for the loader is to boot the kernel. When the kernel is loaded into memory, it is being called by the |