# SOME DESCRIPTIVE TITLE # Copyright (C) YEAR The FreeBSD Project # This file is distributed under the same license as the FreeBSD Documentation package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: FreeBSD Documentation VERSION\n" "POT-Creation-Date: 2024-01-17 20:35-0300\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #. type: YAML Front Matter: description #: documentation/content/en/articles/linux-emulation/_index.adoc:1 #, no-wrap msgid "A technical description about the internals of the Linux emulation layer in FreeBSD" msgstr "" #. type: YAML Front Matter: title #: documentation/content/en/articles/linux-emulation/_index.adoc:1 #, no-wrap msgid "Linux® emulation in FreeBSD" msgstr "" #. type: Title = #: documentation/content/en/articles/linux-emulation/_index.adoc:11 #, no-wrap msgid "Linux(R) emulation in FreeBSD" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:44 msgid "Abstract" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:54 msgid "" "This masters thesis deals with updating the Linux(R) emulation layer (the so " "called _Linuxulator_). The task was to update the layer to match the " "functionality of Linux(R) 2.6. As a reference implementation, the Linux(R) " "2.6.16 kernel was chosen. The concept is loosely based on the NetBSD " "implementation. Most of the work was done in the summer of 2006 as a part " "of the Google Summer of Code students program. The focus was on bringing " "the _NPTL_ (new POSIX(R) thread library) support into the emulation layer, " "including _TLS_ (thread local storage), _futexes_ (fast user space mutexes), " "_PID mangling_, and some other minor things. Many small problems were " "identified and fixed in the process. My work was integrated into the main " "FreeBSD source repository and will be shipped in the upcoming 7.0R release. " "We, the emulation development team, are working on making the Linux(R) 2.6 " "emulation the default emulation layer in FreeBSD." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:56 msgid "'''" msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:60 #, no-wrap msgid "Introduction" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:67 msgid "" "In the last few years the open source UNIX(R) based operating systems " "started to be widely deployed on server and client machines. Among these " "operating systems I would like to point out two: FreeBSD, for its BSD " "heritage, time proven code base and many interesting features and Linux(R) " "for its wide user base, enthusiastic open developer community and support " "from large companies. FreeBSD tends to be used on server class machines " "serving heavy duty networking tasks with less usage on desktop class " "machines for ordinary users. While Linux(R) has the same usage on servers, " "but it is used much more by home based users. This leads to a situation " "where there are many binary only programs available for Linux(R) that lack " "support for FreeBSD." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:69 msgid "" "Naturally, a need for the ability to run Linux(R) binaries on a FreeBSD " "system arises and this is what this thesis deals with: the emulation of the " "Linux(R) kernel in the FreeBSD operating system." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:72 msgid "" "During the Summer of 2006 Google Inc. sponsored a project which focused on " "extending the Linux(R) emulation layer (the so called Linuxulator) in " "FreeBSD to include Linux(R) 2.6 facilities. This thesis is written as a " "part of this project." msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:74 #, no-wrap msgid "A look inside..." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:80 msgid "" "In this section we are going to describe every operating system in " "question. How they deal with syscalls, trapframes etc., all the low-level " "stuff. We also describe the way they understand common UNIX(R) primitives " "like what a PID is, what a thread is, etc. In the third subsection we talk " "about how UNIX(R) on UNIX(R) emulation could be done in general." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:82 #, no-wrap msgid "What is UNIX(R)" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:94 msgid "" "UNIX(R) is an operating system with a long history that has influenced " "almost every other operating system currently in use. Starting in the " "1960s, its development continues to this day (although in different " "projects). UNIX(R) development soon forked into two main ways: the BSDs and " "System III/V families. They mutually influenced themselves by growing a " "common UNIX(R) standard. Among the contributions originated in BSD we can " "name virtual memory, TCP/IP networking, FFS, and many others. The System V " "branch contributed to SysV interprocess communication primitives, copy-on-" "write, etc. UNIX(R) itself does not exist any more but its ideas have been " "used by many other operating systems world wide thus forming the so called " "UNIX(R)-like operating systems. These days the most influential ones are " "Linux(R), Solaris, and possibly (to some extent) FreeBSD. There are in-" "company UNIX(R) derivatives (AIX, HP-UX etc.), but these have been more and " "more migrated to the aforementioned systems. Let us summarize typical " "UNIX(R) characteristics." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:96 #: documentation/content/en/articles/linux-emulation/_index.adoc:187 #: documentation/content/en/articles/linux-emulation/_index.adoc:279 #, no-wrap msgid "Technical details" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:104 msgid "" "Every running program constitutes a process that represents a state of the " "computation. Running process is divided between kernel-space and user-" "space. Some operations can be done only from kernel space (dealing with " "hardware etc.), but the process should spend most of its lifetime in the " "user space. The kernel is where the management of the processes, hardware, " "and low-level details take place. The kernel provides a standard unified " "UNIX(R) API to the user space. The most important ones are covered below." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:106 #, no-wrap msgid "Communication between kernel and user space process" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:114 msgid "" "Common UNIX(R) API defines a syscall as a way to issue commands from a user " "space process to the kernel. The most common implementation is either by " "using an interrupt or specialized instruction (think of `SYSENTER`/`SYSCALL` " "instructions for ia32). Syscalls are defined by a number. For example in " "FreeBSD, the syscall number 85 is the man:swapon[2] syscall and the syscall " "number 132 is man:mkfifo[2]. Some syscalls need parameters, which are " "passed from the user-space to the kernel-space in various ways " "(implementation dependent). Syscalls are synchronous." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:118 msgid "" "Another possible way to communicate is by using a _trap_. Traps occur " "asynchronously after some event occurs (division by zero, page fault etc.). " "A trap can be transparent for a process (page fault) or can result in a " "reaction like sending a _signal_ (division by zero)." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:120 #, no-wrap msgid "Communication between processes" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:125 msgid "" "There are other APIs (System V IPC, shared memory etc.) but the single most " "important API is signal. Signals are sent by processes or by the kernel and " "received by processes. Some signals can be ignored or handled by a user " "supplied routine, some result in a predefined action that cannot be altered " "or ignored." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:127 #, no-wrap msgid "Process management" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:137 msgid "" "Kernel instances are processed first in the system (so called init). Every " "running process can create its identical copy using the man:fork[2] " "syscall. Some slightly modified versions of this syscall were introduced " "but the basic semantic is the same. Every running process can morph into " "some other process using the man:exec[3] syscall. Some modifications of " "this syscall were introduced but all serve the same basic purpose. " "Processes end their lives by calling the man:exit[2] syscall. Every process " "is identified by a unique number called PID. Every process has a defined " "parent (identified by its PID)." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:139 #, no-wrap msgid "Thread management" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:145 msgid "" "Traditional UNIX(R) does not define any API nor implementation for " "threading, while POSIX(R) defines its threading API but the implementation " "is undefined. Traditionally there were two ways of implementing threads. " "Handling them as separate processes (1:1 threading) or envelope the whole " "thread group in one process and managing the threading in userspace (1:N " "threading). Comparing main features of each approach:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:147 msgid "1:1 threading" msgstr "" #. type: Bullet: '- ' #: documentation/content/en/articles/linux-emulation/_index.adoc:149 msgid "heavyweight threads" msgstr "" #. type: Bullet: '- ' #: documentation/content/en/articles/linux-emulation/_index.adoc:150 msgid "" "the scheduling cannot be altered by the user (slightly mitigated by the " "POSIX(R) API)" msgstr "" #. type: Bullet: '+ ' #: documentation/content/en/articles/linux-emulation/_index.adoc:151 msgid "no syscall wrapping necessary" msgstr "" #. type: Bullet: '+ ' #: documentation/content/en/articles/linux-emulation/_index.adoc:152 msgid "can utilize multiple CPUs" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:154 msgid "1:N threading" msgstr "" #. type: Bullet: '+ ' #: documentation/content/en/articles/linux-emulation/_index.adoc:156 msgid "lightweight threads" msgstr "" #. type: Bullet: '+ ' #: documentation/content/en/articles/linux-emulation/_index.adoc:157 msgid "scheduling can be easily altered by the user" msgstr "" #. type: Bullet: '- ' #: documentation/content/en/articles/linux-emulation/_index.adoc:158 msgid "syscalls must be wrapped" msgstr "" #. type: Bullet: '- ' #: documentation/content/en/articles/linux-emulation/_index.adoc:159 msgid "cannot utilize more than one CPU" msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:161 #, no-wrap msgid "What is FreeBSD?" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:169 msgid "" "The FreeBSD project is one of the oldest open source operating systems " "currently available for daily use. It is a direct descendant of the genuine " "UNIX(R) so it could be claimed that it is a true UNIX(R) although licensing " "issues do not permit that. The start of the project dates back to the early " "1990's when a crew of fellow BSD users patched the 386BSD operating system. " "Based on this patchkit a new operating system arose named FreeBSD for its " "liberal license. Another group created the NetBSD operating system with " "different goals in mind. We will focus on FreeBSD." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:174 msgid "" "FreeBSD is a modern UNIX(R)-based operating system with all the features of " "UNIX(R). Preemptive multitasking, multiuser facilities, TCP/IP networking, " "memory protection, symmetric multiprocessing support, virtual memory with " "merged VM and buffer cache, they are all there. One of the interesting and " "extremely useful features is the ability to emulate other UNIX(R)-like " "operating systems. As of December 2006 and 7-CURRENT development, the " "following emulation functionalities are supported:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:176 msgid "FreeBSD/i386 emulation on FreeBSD/amd64" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:177 msgid "FreeBSD/i386 emulation on FreeBSD/ia64" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:178 msgid "Linux(R)-emulation of Linux(R) operating system on FreeBSD" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:179 msgid "NDIS-emulation of Windows networking drivers interface" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:180 msgid "NetBSD-emulation of NetBSD operating system" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:181 msgid "PECoff-support for PECoff FreeBSD executables" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:182 msgid "SVR4-emulation of System V revision 4 UNIX(R)" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:185 msgid "" "Actively developed emulations are the Linux(R) layer and various FreeBSD-on-" "FreeBSD layers. Others are not supposed to work properly nor be usable " "these days." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:195 msgid "" "FreeBSD is traditional flavor of UNIX(R) in the sense of dividing the run of " "processes into two halves: kernel space and user space run. There are two " "types of process entry to the kernel: a syscall and a trap. There is only " "one way to return. In the subsequent sections we will describe the three " "gates to/from the kernel. The whole description applies to the i386 " "architecture as the Linuxulator only exists there but the concept is similar " "on other architectures. The information was taken from [1] and the source " "code." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:197 #, no-wrap msgid "System entries" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:204 msgid "" "FreeBSD has an abstraction called an execution class loader, which is a " "wedge into the man:execve[2] syscall. This employs a structure `sysentvec`, " "which describes an executable ABI. It contains things like errno " "translation table, signal translation table, various functions to serve " "syscall needs (stack fixup, coredumping, etc.). Every ABI the FreeBSD " "kernel wants to support must define this structure, as it is used later in " "the syscall processing code and at some other places. System entries are " "handled by trap handlers, where we can access both the kernel-space and the " "user-space at once." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:206 #: documentation/content/en/articles/linux-emulation/_index.adoc:288 #, no-wrap msgid "Syscalls" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:209 msgid "" "Syscalls on FreeBSD are issued by executing interrupt `0x80` with register " "`%eax` set to a desired syscall number with arguments passed on the stack." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:215 msgid "" "When a process issues an interrupt `0x80`, the `int0x80` syscall trap " "handler is issued (defined in [.filename]#sys/i386/i386/exception.s#), which " "prepares arguments (i.e. copies them on to the stack) for a call to a C " "function man:syscall[2] (defined in [.filename]#sys/i386/i386/trap.c#), " "which processes the passed in trapframe. The processing consists of " "preparing the syscall (depending on the `sysvec` entry), determining if the " "syscall is 32-bit or 64-bit one (changes size of the parameters), then the " "parameters are copied, including the syscall. Next, the actual syscall " "function is executed with processing of the return code (special cases for " "`ERESTART` and `EJUSTRETURN` errors). Finally an `userret()` is scheduled, " "switching the process back to the users-pace. The parameters to the actual " "syscall handler are passed in the form of `struct thread *td`, `struct " "syscall args *` arguments where the second parameter is a pointer to the " "copied in structure of parameters." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:217 #: documentation/content/en/articles/linux-emulation/_index.adoc:307 #: documentation/content/en/articles/linux-emulation/_index.adoc:794 #, no-wrap msgid "Traps" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:224 msgid "" "Handling of traps in FreeBSD is similar to the handling of syscalls. " "Whenever a trap occurs, an assembler handler is called. It is chosen " "between alltraps, alltraps with regs pushed or calltrap depending on the " "type of the trap. This handler prepares arguments for a call to a C " "function `trap()` (defined in [.filename]#sys/i386/i386/trap.c#), which then " "processes the occurred trap. After the processing it might send a signal to " "the process and/or exit to userland using `userret()`." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:226 #: documentation/content/en/articles/linux-emulation/_index.adoc:312 #, no-wrap msgid "Exits" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:230 msgid "" "Exits from kernel to userspace happen using the assembler routine `doreti` " "regardless of whether the kernel was entered via a trap or via a syscall. " "This restores the program status from the stack and returns to the userspace." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:232 #: documentation/content/en/articles/linux-emulation/_index.adoc:318 #, no-wrap msgid "UNIX(R) primitives" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:238 msgid "" "FreeBSD operating system adheres to the traditional UNIX(R) scheme, where " "every process has a unique identification number, the so called _PID_ " "(Process ID). PID numbers are allocated either linearly or randomly ranging " "from `0` to `PID_MAX`. The allocation of PID numbers is done using linear " "searching of PID space. Every thread in a process receives the same PID " "number as result of the man:getpid[2] call." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:249 msgid "" "There are currently two ways to implement threading in FreeBSD. The first " "way is M:N threading followed by the 1:1 threading model. The default " "library used is M:N threading (`libpthread`) and you can switch at runtime " "to 1:1 threading (`libthr`). The plan is to switch to 1:1 library by " "default soon. Although those two libraries use the same kernel primitives, " "they are accessed through different API(es). The M:N library uses the " "`kse_*` family of syscalls while the 1:1 library uses the `thr_*` family of " "syscalls. Due to this, there is no general concept of thread ID shared " "between kernel and userspace. Of course, both threading libraries implement " "the pthread thread ID API. Every kernel thread (as described by `struct " "thread`) has td tid identifier but this is not directly accessible from " "userland and solely serves the kernel's needs. It is also used for 1:1 " "threading library as pthread's thread ID but handling of this is internal to " "the library and cannot be relied on." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:257 msgid "" "As stated previously there are two implementations of threading in FreeBSD. " "The M:N library divides the work between kernel space and userspace. Thread " "is an entity that gets scheduled in the kernel but it can represent various " "number of userspace threads. M userspace threads get mapped to N kernel " "threads thus saving resources while keeping the ability to exploit " "multiprocessor parallelism. Further information about the implementation " "can be obtained from the man page or [1]. The 1:1 library directly maps a " "userland thread to a kernel thread thus greatly simplifying the scheme. " "None of these designs implement a fairness mechanism (such a mechanism was " "implemented but it was removed recently because it caused serious slowdown " "and made the code more difficult to deal with)." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:259 #, no-wrap msgid "What is Linux(R)" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:263 msgid "" "Linux(R) is a UNIX(R)-like kernel originally developed by Linus Torvalds, " "and now being contributed to by a massive crowd of programmers all around " "the world. From its mere beginnings to today, with wide support from " "companies such as IBM or Google, Linux(R) is being associated with its fast " "development pace, full hardware support and benevolent dictator model of " "organization." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:267 msgid "" "Linux(R) development started in 1991 as a hobbyist project at University of " "Helsinki in Finland. Since then it has obtained all the features of a " "modern UNIX(R)-like OS: multiprocessing, multiuser support, virtual memory, " "networking, basically everything is there. There are also highly advanced " "features like virtualization etc." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:270 msgid "" "As of 2006 Linux(R) seems to be the most widely used open source operating " "system with support from independent software vendors like Oracle, " "RealNetworks, Adobe, etc. Most of the commercial software distributed for " "Linux(R) can only be obtained in a binary form so recompilation for other " "operating systems is impossible." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:275 msgid "" "Most of the Linux(R) development happens in a Git version control system. " "Git is a distributed system so there is no central source of the Linux(R) " "code, but some branches are considered prominent and official. The version " "number scheme implemented by Linux(R) consists of four numbers A.B.C.D. " "Currently development happens in 2.6.C.D, where C represents major version, " "where new features are added or changed while D is a minor version for " "bugfixes only." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:277 msgid "More information can be obtained from [3]." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:286 msgid "" "Linux(R) follows the traditional UNIX(R) scheme of dividing the run of a " "process in two halves: the kernel and user space. The kernel can be entered " "in two ways: via a trap or via a syscall. The return is handled only in one " "way. The further description applies to Linux(R) 2.6 on the i386(TM) " "architecture. This information was taken from [2]." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:296 msgid "" "Syscalls in Linux(R) are performed (in userspace) using `syscallX` macros " "where X substitutes a number representing the number of parameters of the " "given syscall. This macro translates to a code that loads `%eax` register " "with a number of the syscall and executes interrupt `0x80`. After this " "syscall return is called, which translates negative return values to " "positive `errno` values and sets `res` to `-1` in case of an error. " "Whenever the interrupt `0x80` is called the process enters the kernel in " "system call trap handler. This routine saves all registers on the stack and " "calls the selected syscall entry. Note that the Linux(R) calling convention " "expects parameters to the syscall to be passed via registers as shown here:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:298 msgid "parameter -> `%ebx`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:299 msgid "parameter -> `%ecx`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:300 msgid "parameter -> `%edx`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:301 msgid "parameter -> `%esi`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:302 msgid "parameter -> `%edi`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:303 msgid "parameter -> `%ebp`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:305 msgid "" "There are some exceptions to this, where Linux(R) uses different calling " "convention (most notably the `clone` syscall)." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:310 msgid "" "The trap handlers are introduced in [.filename]#arch/i386/kernel/traps.c# " "and most of these handlers live in [.filename]#arch/i386/kernel/entry.S#, " "where handling of the traps happens." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:316 msgid "" "Return from the syscall is managed by syscall man:exit[3], which checks for " "the process having unfinished work, then checks whether we used user-" "supplied selectors. If this happens stack fixing is applied and finally the " "registers are restored from the stack and the process returns to the " "userspace." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:327 msgid "" "In the 2.6 version, the Linux(R) operating system redefined some of the " "traditional UNIX(R) primitives, notably PID, TID and thread. PID is defined " "not to be unique for every process, so for some processes (threads) man:" "getppid[2] returns the same value. Unique identification of process is " "provided by TID. This is because _NPTL_ (New POSIX(R) Thread Library) " "defines threads to be normal processes (so called 1:1 threading). Spawning " "a new process in Linux(R) 2.6 happens using the `clone` syscall (fork " "variants are reimplemented using it). This clone syscall defines a set of " "flags that affect behavior of the cloning process regarding thread " "implementation. The semantic is a bit fuzzy as there is no single flag " "telling the syscall to create a thread." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:329 msgid "Implemented clone flags are:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:331 msgid "`CLONE_VM` - processes share their memory space" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:332 msgid "`CLONE_FS` - share umask, cwd and namespace" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:333 msgid "`CLONE_FILES` - share open files" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:334 msgid "`CLONE_SIGHAND` - share signal handlers and blocked signals" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:335 msgid "`CLONE_PARENT` - share parent" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:336 msgid "`CLONE_THREAD` - be thread (further explanation below)" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:337 msgid "`CLONE_NEWNS` - new namespace" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:338 msgid "`CLONE_SYSVSEM` - share SysV undo structures" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:339 msgid "`CLONE_SETTLS` - setup TLS at supplied address" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:340 msgid "`CLONE_PARENT_SETTID` - set TID in the parent" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:341 msgid "`CLONE_CHILD_CLEARTID` - clear TID in the child" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:342 msgid "`CLONE_CHILD_SETTID` - set TID in the child" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:348 msgid "" "`CLONE_PARENT` sets the real parent to the parent of the caller. This is " "useful for threads because if thread A creates thread B we want thread B to " "be parented to the parent of the whole thread group. `CLONE_THREAD` does " "exactly the same thing as `CLONE_PARENT`, `CLONE_VM` and `CLONE_SIGHAND`, " "rewrites PID to be the same as PID of the caller, sets exit signal to be " "none and enters the thread group. `CLONE_SETTLS` sets up GDT entries for " "TLS handling. The `CLONE_*_*TID` set of flags sets/clears user supplied " "address to TID or 0." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:352 msgid "" "As you can see the `CLONE_THREAD` does most of the work and does not seem to " "fit the scheme very well. The original intention is unclear (even for " "authors, according to comments in the code) but I think originally there was " "one threading flag, which was then parcelled among many other flags but this " "separation was never fully finished. It is also unclear what this partition " "is good for as glibc does not use that so only hand-written use of the clone " "permits a programmer to access this features." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:355 msgid "" "For non-threaded programs the PID and TID are the same. For threaded " "programs the first thread PID and TID are the same and every created thread " "shares the same PID and gets assigned a unique TID (because `CLONE_THREAD` " "is passed in) also parent is shared for all processes forming this threaded " "program." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:357 msgid "" "The code that implements man:pthread_create[3] in NPTL defines the clone " "flags like this:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:361 #, no-wrap msgid "int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:363 #, no-wrap msgid " | CLONE_SETTLS | CLONE_PARENT_SETTID\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:366 #, no-wrap msgid "" "| CLONE_CHILD_CLEARTID | CLONE_SYSVSEM\n" "#if __ASSUME_NO_CLONE_DETACHED == 0\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:369 #, no-wrap msgid "" "| CLONE_DETACHED\n" "#endif\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:371 #, no-wrap msgid "| 0);\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:374 msgid "The `CLONE_SIGNAL` is defined like" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:378 #, no-wrap msgid "#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:381 msgid "the last 0 means no signal is sent when any of the threads exits." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:383 #, no-wrap msgid "What is emulation" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:388 msgid "" "According to a dictionary definition, emulation is the ability of a program " "or device to imitate another program or device. This is achieved by " "providing the same reaction to a given stimulus as the emulated object. In " "practice, the software world mostly sees three types of emulation - a " "program used to emulate a machine (QEMU, various game console emulators " "etc.), software emulation of a hardware facility (OpenGL emulators, floating " "point units emulation etc.) and operating system emulation (either in kernel " "of the operating system or as a userspace program)." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:395 msgid "" "Emulation is usually used in a place, where using the original component is " "not feasible nor possible at all. For example someone might want to use a " "program developed for a different operating system than they use. Then " "emulation comes in handy. Sometimes there is no other way but to use " "emulation - e.g. when the hardware device you try to use does not exist (yet/" "anymore) then there is no other way but emulation. This happens often when " "porting an operating system to a new (non-existent) platform. Sometimes it " "is just cheaper to emulate." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:407 msgid "" "Looking from an implementation point of view, there are two main approaches " "to the implementation of emulation. You can either emulate the whole thing " "- accepting possible inputs of the original object, maintaining inner state " "and emitting correct output based on the state and/or input. This kind of " "emulation does not require any special conditions and basically can be " "implemented anywhere for any device/program. The drawback is that " "implementing such emulation is quite difficult, time-consuming and error-" "prone. In some cases we can use a simpler approach. Imagine you want to " "emulate a printer that prints from left to right on a printer that prints " "from right to left. It is obvious that there is no need for a complex " "emulation layer but simply reversing of the printed text is sufficient. " "Sometimes the emulating environment is very similar to the emulated one so " "just a thin layer of some translation is necessary to provide fully working " "emulation! As you can see this is much less demanding to implement, so less " "time-consuming and error-prone than the previous approach. But the " "necessary condition is that the two environments must be similar enough. " "The third approach combines the two previous. Most of the time the objects " "do not provide the same capabilities so in a case of emulating the more " "powerful one on the less powerful we have to emulate the missing features " "with full emulation described above." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:410 msgid "" "This master thesis deals with emulation of UNIX(R) on UNIX(R), which is " "exactly the case, where only a thin layer of translation is sufficient to " "provide full emulation. The UNIX(R) API consists of a set of syscalls, " "which are usually self contained and do not affect some global kernel state." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:412 msgid "" "There are a few syscalls that affect inner state but this can be dealt with " "by providing some structures that maintain the extra state." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:416 msgid "" "No emulation is perfect and emulations tend to lack some parts but this " "usually does not cause any serious drawbacks. Imagine a game console " "emulator that emulates everything but music output. No doubt that the games " "are playable and one can use the emulator. It might not be that comfortable " "as the original game console but its an acceptable compromise between price " "and comfort." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:420 msgid "" "The same goes with the UNIX(R) API. Most programs can live with a very " "limited set of syscalls working. Those syscalls tend to be the oldest ones " "(man:read[2]/man:write[2], man:fork[2] family, man:signal[3] handling, man:" "exit[3], man:socket[2] API) hence it is easy to emulate because their " "semantics is shared among all UNIX(R)es, which exist todays." msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:422 #, no-wrap msgid "Emulation" msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:424 #, no-wrap msgid "How emulation works in FreeBSD" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:429 msgid "" "As stated earlier, FreeBSD supports running binaries from several other " "UNIX(R)es. This works because FreeBSD has an abstraction called the " "execution class loader. This wedges into the man:execve[2] syscall, so when " "man:execve[2] is about to execute a binary it examines its type." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:435 msgid "" "There are basically two types of binaries in FreeBSD. Shell-like text " "scripts which are identified by `#!` as their first two characters and " "normal (typically _ELF_) binaries, which are a representation of a compiled " "executable object. The vast majority (one could say all of them) of " "binaries in FreeBSD are from type ELF. ELF files contain a header, which " "specifies the OS ABI for this ELF file. By reading this information, the " "operating system can accurately determine what type of binary the given file " "is." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:441 msgid "" "Every OS ABI must be registered in the FreeBSD kernel. This applies to the " "FreeBSD native OS ABI, as well. So when man:execve[2] executes a binary it " "iterates through the list of registered APIs and when it finds the right one " "it starts to use the information contained in the OS ABI description (its " "syscall table, `errno` translation table, etc.). So every time the process " "calls a syscall, it uses its own set of syscalls instead of some global " "one. This effectively provides a very elegant and easy way of supporting " "execution of various binary formats." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:446 msgid "" "The nature of emulation of different OSes (and also some other subsystems) " "led developers to invite a handler event mechanism. There are various " "places in the kernel, where a list of event handlers are called. Every " "subsystem can register an event handler and they are called accordingly. " "For example, when a process exits there is a handler called that possibly " "cleans up whatever the subsystem needs to be cleaned." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:448 msgid "" "Those simple facilities provide basically everything that is needed for the " "emulation infrastructure and in fact these are basically the only things " "necessary to implement the Linux(R) emulation layer." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:450 #, no-wrap msgid "Common primitives in the FreeBSD kernel" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:454 msgid "" "Emulation layers need some support from the operating system. I am going to " "describe some of the supported primitives in the FreeBSD operating system." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:456 #, no-wrap msgid "Locking primitives" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:459 msgid "Contributed by: `{attilio}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:461 msgid "" "The FreeBSD synchronization primitive set is based on the idea to supply a " "rather huge number of different primitives in a way that the better one can " "be used for every particular, appropriate situation." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:463 msgid "" "To a high level point of view you can consider three kinds of " "synchronization primitives in the FreeBSD kernel:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:465 msgid "atomic operations and memory barriers" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:466 msgid "locks" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:467 msgid "scheduling barriers" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:470 msgid "" "Below there are descriptions for the 3 families. For every lock, you should " "really check the linked manpage (where possible) for more detailed " "explanations." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:472 #, no-wrap msgid "Atomic operations and memory barriers" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:483 msgid "" "Atomic operations are implemented through a set of functions performing " "simple arithmetics on memory operands in an atomic way with respect to " "external events (interrupts, preemption, etc.). Atomic operations can " "guarantee atomicity just on small data types (in the magnitude order of the " "`.long.` architecture C data type), so should be rarely used directly in the " "end-level code, if not only for very simple operations (like flag setting in " "a bitmap, for example). In fact, it is rather simple and common to write " "down a wrong semantic based on just atomic operations (usually referred as " "lock-less). The FreeBSD kernel offers a way to perform atomic operations in " "conjunction with a memory barrier. The memory barriers will guarantee that " "an atomic operation will happen following some specified ordering with " "respect to other memory accesses. For example, if we need that an atomic " "operation happen just after all other pending writes (in terms of " "instructions reordering buffers activities) are completed, we need to " "explicitly use a memory barrier in conjunction to this atomic operation. So " "it is simple to understand why memory barriers play a key role for higher-" "level locks building (just as refcounts, mutexes, etc.). For a detailed " "explanatory on atomic operations, please refer to man:atomic[9]. It is far, " "however, noting that atomic operations (and memory barriers as well) should " "ideally only be used for building front-ending locks (as mutexes)." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:485 #, no-wrap msgid "Refcounts" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:491 msgid "" "Refcounts are interfaces for handling reference counters. They are " "implemented through atomic operations and are intended to be used just for " "cases, where the reference counter is the only one thing to be protected, so " "even something like a spin-mutex is deprecated. Using the refcount " "interface for structures, where a mutex is already used is often wrong since " "we should probably close the reference counter in some already protected " "paths. A manpage discussing refcount does not exist currently, just check [." "filename]#sys/refcount.h# for an overview of the existing API." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:493 #, no-wrap msgid "Locks" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:498 msgid "" "FreeBSD kernel has huge classes of locks. Every lock is defined by some " "peculiar properties, but probably the most important is the event linked to " "contesting holders (or in other terms, the behavior of threads unable to " "acquire the lock). FreeBSD's locking scheme presents three different " "behaviors for contenders:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:500 msgid "spinning" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:501 msgid "blocking" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:502 msgid "sleeping" msgstr "" #. type: delimited block = 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:506 msgid "numbers are not casual" msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:509 #, no-wrap msgid "Spinning locks" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:515 msgid "" "Spin locks let waiters to spin until they cannot acquire the lock. An " "important matter do deal with is when a thread contests on a spin lock if it " "is not descheduled. Since the FreeBSD kernel is preemptive, this exposes " "spin lock at the risk of deadlocks that can be solved just disabling " "interrupts while they are acquired. For this and other reasons (like lack " "of priority propagation support, poorness in load balancing schemes between " "CPUs, etc.), spin locks are intended to protect very small paths of code, or " "ideally not to be used at all if not explicitly requested (explained later)." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:517 #, no-wrap msgid "Blocking" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:522 msgid "" "Block locks let waiters to be descheduled and blocked until the lock owner " "does not drop it and wakes up one or more contenders. To avoid starvation " "issues, blocking locks do priority propagation from the waiters to the " "owner. Block locks must be implemented through the turnstile interface and " "are intended to be the most used kind of locks in the kernel, if no " "particular conditions are met." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:524 #, no-wrap msgid "Sleeping" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:529 msgid "" "Sleep locks let waiters to be descheduled and fall asleep until the lock " "holder does not drop it and wakes up one or more waiters. Since sleep locks " "are intended to protect large paths of code and to cater asynchronous " "events, they do not do any form of priority propagation. They must be " "implemented through the man:sleepqueue[9] interface." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:533 msgid "" "The order used to acquire locks is very important, not only for the " "possibility to deadlock due at lock order reversals, but even because lock " "acquisition should follow specific rules linked to locks natures. If you " "give a look at the table above, the practical rule is that if a thread holds " "a lock of level n (where the level is the number listed close to the kind of " "lock) it is not allowed to acquire a lock of superior levels, since this " "would break the specified semantic for a path. For example, if a thread " "holds a block lock (level 2), it is allowed to acquire a spin lock (level 1) " "but not a sleep lock (level 3), since block locks are intended to protect " "smaller paths than sleep lock (these rules are not about atomic operations " "or scheduling barriers, however)." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:535 msgid "This is a list of lock with their respective behaviors:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:537 msgid "spin mutex - spinning - man:mutex[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:538 msgid "sleep mutex - blocking - man:mutex[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:539 msgid "pool mutex - blocking - man:mtx[pool]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:540 msgid "" "sleep family - sleeping - man:sleep[9] pause tsleep msleep msleep spin " "msleep rw msleep sx" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:541 msgid "condvar - sleeping - man:condvar[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:542 msgid "rwlock - blocking - man:rwlock[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:543 msgid "sxlock - sleeping - man:sx[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:544 msgid "lockmgr - sleeping - man:lockmgr[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:545 msgid "semaphores - sleeping - man:sema[9]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:547 msgid "" "Among these locks only mutexes, sxlocks, rwlocks and lockmgrs are intended " "to handle recursion, but currently recursion is only supported by mutexes " "and lockmgrs." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:549 #, no-wrap msgid "Scheduling barriers" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:553 msgid "" "Scheduling barriers are intended to be used to drive scheduling of " "threading. They consist mainly of three different stubs:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:555 msgid "critical sections (and preemption)" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:556 msgid "sched_bind" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:557 msgid "sched_pin" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:559 msgid "" "Generally, these should be used only in a particular context and even if " "they can often replace locks, they should be avoided because they do not let " "the diagnose of simple eventual problems with locking debugging tools (as " "man:witness[4])." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:561 #, no-wrap msgid "Critical sections" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:569 msgid "" "The FreeBSD kernel has been made preemptive basically to deal with interrupt " "threads. In fact, to avoid high interrupt latency, time-sharing priority " "threads can be preempted by interrupt threads (in this way, they do not need " "to wait to be scheduled as the normal path previews). Preemption, however, " "introduces new racing points that need to be handled, as well. Often, to " "deal with preemption, the simplest thing to do is to completely disable it. " "A critical section defines a piece of code (borderlined by the pair of " "functions man:critical_enter[9] and man:critical_exit[9], where preemption " "is guaranteed to not happen (until the protected code is fully executed). " "This can often replace a lock effectively but should be used carefully to " "not lose the whole advantage that preemption brings." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:571 #, no-wrap msgid "sched_pin/sched_unpin" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:577 msgid "" "Another way to deal with preemption is the `sched_pin()` interface. If a " "piece of code is closed in the `sched_pin()` and `sched_unpin()` pair of " "functions it is guaranteed that the respective thread, even if it can be " "preempted, it will always be executed on the same CPU. Pinning is very " "effective in the particular case when we have to access at per-cpu datas and " "we assume other threads will not change those data. The latter condition " "will determine a critical section as a too strong condition for our code." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:579 #, no-wrap msgid "sched_bind/sched_unbind" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:584 msgid "" "`sched_bind` is an API used to bind a thread to a particular CPU for all the " "time it executes the code, until a `sched_unbind` function call does not " "unbind it. This feature has a key role in situations where you cannot trust " "the current state of CPUs (for example, at very early stages of boot), as " "you want to avoid your thread to migrate on inactive CPUs. Since " "`sched_bind` and `sched_unbind` manipulate internal scheduler structures, " "they need to be enclosed in `sched_lock` acquisition/releasing when used." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:586 #, no-wrap msgid "Proc structure" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:592 msgid "" "Various emulation layers sometimes require some additional per-process " "data. It can manage separate structures (a list, a tree etc.) containing " "these data for every process but this tends to be slow and memory " "consuming. To solve this problem the FreeBSD `proc` structure contains " "`p_emuldata`, which is a void pointer to some emulation layer specific " "data. This `proc` entry is protected by the proc mutex." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:597 msgid "" "The FreeBSD `proc` structure contains a `p_sysent` entry that identifies, " "which ABI this process is running. In fact, it is a pointer to the " "`sysentvec` described above. So by comparing this pointer to the address " "where the `sysentvec` structure for the given ABI is stored we can " "effectively determine whether the process belongs to our emulation layer. " "The code typically looks like:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:602 #, no-wrap msgid "" "if (__predict_true(p->p_sysent != &elf_Linux(R)_sysvec))\n" "\t return;\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:606 msgid "" "As you can see, we effectively use the `__predict_true` modifier to collapse " "the most common case (FreeBSD process) to a simple return operation thus " "preserving high performance. This code should be turned into a macro " "because currently it is not very flexible, i.e. we do not support Linux(R)64 " "emulation nor A.OUT Linux(R) processes on i386." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:608 #, no-wrap msgid "VFS" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:617 msgid "" "The FreeBSD VFS subsystem is very complex but the Linux(R) emulation layer " "uses just a small subset via a well defined API. It can either operate on " "vnodes or file handlers. Vnode represents a virtual vnode, i.e. " "representation of a node in VFS. Another representation is a file handler, " "which represents an opened file from the perspective of a process. A file " "handler can represent a socket or an ordinary file. A file handler contains " "a pointer to its vnode. More then one file handler can point to the same " "vnode." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:619 #, no-wrap msgid "namei" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:626 msgid "" "The man:namei[9] routine is a central entry point to pathname lookup and " "translation. It traverses the path point by point from the starting point " "to the end point using lookup function, which is internal to VFS. The man:" "namei[9] syscall can cope with symlinks, absolute and relative paths. When " "a path is looked up using man:namei[9] it is inputed to the name cache. This " "behavior can be suppressed. This routine is used all over the kernel and " "its performance is very critical." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:628 #, no-wrap msgid "vn_fullpath" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:634 msgid "" "The man:vn_fullpath[9] function takes the best effort to traverse VFS name " "cache and returns a path for a given (locked) vnode. This process is " "unreliable but works just fine for the most common cases. The unreliability " "is because it relies on VFS cache (it does not traverse the on medium " "structures), it does not work with hardlinks, etc. This routine is used in " "several places in the Linuxulator." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:636 #, no-wrap msgid "Vnode operations" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:639 msgid "" "`fgetvp` - given a thread and a file descriptor number it returns the " "associated vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:640 msgid "man:vn_lock[9] - locks a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:641 msgid "`vn_unlock` - unlocks a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:642 msgid "man:VOP_READDIR[9] - reads a directory referenced by a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:643 msgid "" "man:VOP_GETATTR[9] - gets attributes of a file or a directory referenced by " "a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:644 msgid "man:VOP_LOOKUP[9] - looks up a path to a given directory" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:645 msgid "man:VOP_OPEN[9] - opens a file referenced by a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:646 msgid "man:VOP_CLOSE[9] - closes a file referenced by a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:647 msgid "man:vput[9] - decrements the use count for a vnode and unlocks it" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:648 msgid "man:vrele[9] - decrements the use count for a vnode" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:649 msgid "man:vref[9] - increments the use count for a vnode" msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:651 #, no-wrap msgid "File handler operations" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:654 msgid "" "`fget` - given a thread and a file descriptor number it returns associated " "file handler and references it" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:655 msgid "`fdrop` - drops a reference to a file handler" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:656 msgid "`fhold` - references a file handler" msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:658 #, no-wrap msgid "Linux(R) emulation layer -MD part" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:666 msgid "" "This section deals with implementation of Linux(R) emulation layer in " "FreeBSD operating system. It first describes the machine dependent part " "talking about how and where interaction between userland and kernel is " "implemented. It talks about syscalls, signals, ptrace, traps, stack fixup. " "This part discusses i386 but it is written generally so other architectures " "should not differ very much. The next part is the machine independent part " "of the Linuxulator. This section only covers i386 and ELF handling. A.OUT " "is obsolete and untested." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:668 #, no-wrap msgid "Syscall handling" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:672 msgid "" "Syscall handling is mostly written in [.filename]#linux_sysvec.c#, which " "covers most of the routines pointed out in the `sysentvec` structure. When " "a Linux(R) process running on FreeBSD issues a syscall, the general syscall " "routine calls linux prepsyscall routine for the Linux(R) ABI." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:674 #, no-wrap msgid "Linux(R) prepsyscall" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:681 msgid "" "Linux(R) passes arguments to syscalls via registers (that is why it is " "limited to 6 parameters on i386) while FreeBSD uses the stack. The Linux(R) " "prepsyscall routine must copy parameters from registers to the stack. The " "order of the registers is: `%ebx`, `%ecx`, `%edx`, `%esi`, `%edi`, `%ebp`. " "The catch is that this is true for only _most_ of the syscalls. Some (most " "notably `clone`) uses a different order but it is luckily easy to fix by " "inserting a dummy parameter in the `linux_clone` prototype." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:683 #, no-wrap msgid "Syscall writing" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:687 msgid "" "Every syscall implemented in the Linuxulator must have its prototype with " "various flags in [.filename]#syscalls.master#. The form of the file is:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:695 #, no-wrap msgid "" "...\n" "\tAUE_FORK STD\t\t{ int linux_fork(void); }\n" "...\n" "\tAUE_CLOSE NOPROTO\t{ int close(int fd); }\n" "...\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:705 msgid "" "The first column represents the syscall number. The second column is for " "auditing support. The third column represents the syscall type. It is " "either `STD`, `OBSOL`, `NOPROTO` and `UNIMPL`. `STD` is a standard syscall " "with full prototype and implementation. `OBSOL` is obsolete and defines " "just the prototype. `NOPROTO` means that the syscall is implemented " "elsewhere so do not prepend ABI prefix, etc. `UNIMPL` means that the " "syscall will be substituted with the `nosys` syscall (a syscall just " "printing out a message about the syscall not being implemented and returning " "`ENOSYS`)." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:708 msgid "" "From [.filename]#syscalls.master# a script generates three files: [." "filename]#linux_syscall.h#, [.filename]#linux_proto.h# and [." "filename]#linux_sysent.c#. The [.filename]#linux_syscall.h# contains " "definitions of syscall names and their numerical value, e.g.:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:716 #, no-wrap msgid "" "...\n" "#define LINUX_SYS_linux_fork 2\n" "...\n" "#define LINUX_SYS_close 6\n" "...\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:719 msgid "" "The [.filename]#linux_proto.h# contains structure definitions of arguments " "to every syscall, e.g.:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:725 #, no-wrap msgid "" "struct linux_fork_args {\n" " register_t dummy;\n" "};\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:728 msgid "" "And finally, [.filename]#linux_sysent.c# contains structure describing the " "system entry table, used to actually dispatch a syscall, e.g.:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:733 #, no-wrap msgid "" "{ 0, (sy_call_t *)linux_fork, AUE_FORK, NULL, 0, 0 }, /* 2 = linux_fork */\n" "{ AS(close_args), (sy_call_t *)close, AUE_CLOSE, NULL, 0, 0 }, /* 6 = close */\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:737 msgid "" "As you can see `linux_fork` is implemented in Linuxulator itself so the " "definition is of `STD` type and has no argument, which is exhibited by the " "dummy argument structure. On the other hand `close` is just an alias for " "real FreeBSD man:close[2] so it has no linux arguments structure associated " "and in the system entry table it is not prefixed with linux as it calls the " "real man:close[2] in the kernel." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:739 #, no-wrap msgid "Dummy syscalls" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:745 msgid "" "The Linux(R) emulation layer is not complete, as some syscalls are not " "implemented properly and some are not implemented at all. The emulation " "layer employs a facility to mark unimplemented syscalls with the `DUMMY` " "macro. These dummy definitions reside in [.filename]#linux_dummy.c# in a " "form of `DUMMY(syscall);`, which is then translated to various syscall " "auxiliary files and the implementation consists of printing a message saying " "that this syscall is not implemented. The `UNIMPL` prototype is not used " "because we want to be able to identify the name of the syscall that was " "called to know what syscalls are more important to implement." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:747 #, no-wrap msgid "Signal handling" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:751 msgid "" "Signal handling is done generally in the FreeBSD kernel for all binary " "compatibilities with a call to a compat-dependent layer. Linux(R) " "compatibility layer defines `linux_sendsig` routine for this purpose." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:753 #, no-wrap msgid "Linux(R) sendsig" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:760 msgid "" "This routine first checks whether the signal has been installed with a " "`SA_SIGINFO` in which case it calls `linux_rt_sendsig` routine instead. " "Furthermore, it allocates (or reuses an already existing) signal handle " "context, then it builds a list of arguments for the signal handler. It " "translates the signal number based on the signal translation table, assigns " "a handler, translates sigset. Then it saves context for the `sigreturn` " "routine (various registers, translated trap number and signal mask). " "Finally, it copies out the signal context to the userspace and prepares " "context for the actual signal handler to run." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:762 #, no-wrap msgid "linux_rt_sendsig" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:767 msgid "" "This routine is similar to `linux_sendsig` just the signal context " "preparation is different. It adds `siginfo`, `ucontext`, and some POSIX(R) " "parts. It might be worth considering whether those two functions could not " "be merged with a benefit of less code duplication and possibly even faster " "execution." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:769 #, no-wrap msgid "linux_sigreturn" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:774 msgid "" "This syscall is used for return from the signal handler. It does some " "security checks and restores the original process context. It also unmasks " "the signal in process signal mask." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:776 #, no-wrap msgid "Ptrace" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:782 msgid "" "Many UNIX(R) derivates implement the man:ptrace[2] syscall to allow various " "tracking and debugging features. This facility enables the tracing process " "to obtain various information about the traced process, like register dumps, " "any memory from the process address space, etc. and also to trace the " "process like in stepping an instruction or between system entries (syscalls " "and traps). man:ptrace[2] also lets you set various information in the " "traced process (registers etc.). man:ptrace[2] is a UNIX(R)-wide standard " "implemented in most UNIX(R)es around the world." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:788 msgid "" "Linux(R) emulation in FreeBSD implements the man:ptrace[2] facility in [." "filename]#linux_ptrace.c#. The routines for converting registers between " "Linux(R) and FreeBSD and the actual man:ptrace[2] syscall emulation " "syscall. The syscall is a long switch block that implements its counterpart " "in FreeBSD for every man:ptrace[2] command. The man:ptrace[2] commands are " "mostly equal between Linux(R) and FreeBSD so usually just a small " "modification is needed. For example, `PT_GETREGS` in Linux(R) operates on " "direct data while FreeBSD uses a pointer to the data so after performing a " "(native) man:ptrace[2] syscall, a copyout must be done to preserve Linux(R) " "semantics." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:792 msgid "" "The man:ptrace[2] implementation in Linuxulator has some known weaknesses. " "There have been panics seen when using `strace` (which is a man:ptrace[2] " "consumer) in the Linuxulator environment. Also `PT_SYSCALL` is not " "implemented." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:799 msgid "" "Whenever a Linux(R) process running in the emulation layer traps the trap " "itself is handled transparently with the only exception of the trap " "translation. Linux(R) and FreeBSD differs in opinion on what a trap is so " "this is dealt with here. The code is actually very short:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:805 #, no-wrap msgid "" "static int\n" "translate_traps(int signal, int trap_code)\n" "{\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:808 #, no-wrap msgid "" " if (signal != SIGBUS)\n" " return signal;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:810 #, no-wrap msgid " switch (trap_code) {\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:816 #, no-wrap msgid "" " case T_PROTFLT:\n" " case T_TSSFLT:\n" " case T_DOUBLEFLT:\n" " case T_PAGEFLT:\n" " return SIGSEGV;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:821 #, no-wrap msgid "" " default:\n" " return signal;\n" " }\n" "}\n" msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:824 #, no-wrap msgid "Stack fixup" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:831 msgid "" "The RTLD run-time link-editor expects so called AUX tags on stack during an " "`execve` so a fixup must be done to ensure this. Of course, every RTLD " "system is different so the emulation layer must provide its own stack fixup " "routine to do this. So does Linuxulator. The `elf_linux_fixup` simply " "copies out AUX tags to the stack and adjusts the stack of the user space " "process to point right after those tags. So RTLD works in a smart way." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:833 #, no-wrap msgid "A.OUT support" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:840 msgid "" "The Linux(R) emulation layer on i386 also supports Linux(R) A.OUT binaries. " "Pretty much everything described in the previous sections must be " "implemented for A.OUT support (beside traps translation and signals " "sending). The support for A.OUT binaries is no longer maintained, " "especially the 2.6 emulation does not work with it but this does not cause " "any problem, as the linux-base in ports probably do not support A.OUT " "binaries at all. This support will probably be removed in future. Most of " "the stuff necessary for loading Linux(R) A.OUT binaries is in [." "filename]#imgact_linux.c# file." msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:842 #, no-wrap msgid "Linux(R) emulation layer -MI part" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:847 msgid "" "This section talks about machine independent part of the Linuxulator. It " "covers the emulation infrastructure needed for Linux(R) 2.6 emulation, the " "thread local storage (TLS) implementation (on i386) and futexes. Then we " "talk briefly about some syscalls." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:849 #, no-wrap msgid "Description of NPTL" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:857 msgid "" "One of the major areas of progress in development of Linux(R) 2.6 was " "threading. Prior to 2.6, the Linux(R) threading support was implemented in " "the linuxthreads library. The library was a partial implementation of " "POSIX(R) threading. The threading was implemented using separate processes " "for each thread using the `clone` syscall to let them share the address " "space (and other things). The main weaknesses of this approach was that " "every thread had a different PID, signal handling was broken (from the " "pthreads perspective), etc. Also the performance was not very good (use of " "`SIGUSR` signals for threads synchronization, kernel resource consumption, " "etc.) so to overcome these problems a new threading system was developed and " "named NPTL." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:863 msgid "" "The NPTL library focused on two things but a third thing came along so it is " "usually considered a part of NPTL. Those two things were embedding of " "threads into a process structure and futexes. The additional third thing " "was TLS, which is not directly required by NPTL but the whole NPTL userland " "library depends on it. Those improvements yielded in much improved " "performance and standards conformance. NPTL is a standard threading library " "in Linux(R) systems these days." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:867 msgid "" "The FreeBSD Linuxulator implementation approaches the NPTL in three main " "areas. The TLS, futexes and PID mangling, which is meant to simulate the " "Linux(R) threads. Further sections describe each of these areas." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:869 #, no-wrap msgid "Linux(R) 2.6 emulation infrastructure" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:872 msgid "" "These sections deal with the way Linux(R) threads are managed and how we " "simulate that in FreeBSD." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:874 #, no-wrap msgid "Runtime determining of 2.6 emulation" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:883 msgid "" "The Linux(R) emulation layer in FreeBSD supports runtime setting of the " "emulated version. This is done via man:sysctl[8], namely `compat.linux." "osrelease`. Setting this man:sysctl[8] affects runtime behavior of the " "emulation layer. When set to 2.6.x it sets the value of `linux_use_linux26` " "while setting to something else keeps it unset. This variable (plus per-" "prison variables of the very same kind) determines whether 2.6 " "infrastructure (mainly PID mangling) is used in the code or not. The " "version setting is done system-wide and this affects all Linux(R) " "processes. The man:sysctl[8] should not be changed when running any " "Linux(R) binary as it might harm things." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:885 #, no-wrap msgid "Linux(R) processes and thread identifiers" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:892 msgid "" "The semantics of Linux(R) threading are a little confusing and uses entirely " "different nomenclature to FreeBSD. A process in Linux(R) consists of a " "`struct task` embedding two identifier fields - PID and TGID. PID is _not_ " "a process ID but it is a thread ID. The TGID identifies a thread group in " "other words a process. For single-threaded process the PID equals the TGID." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:898 msgid "" "The thread in NPTL is just an ordinary process that happens to have TGID not " "equal to PID and have a group leader not equal to itself (and shared VM etc. " "of course). Everything else happens in the same way as to an ordinary " "process. There is no separation of a shared status to some external " "structure like in FreeBSD. This creates some duplication of information and " "possible data inconsistency. The Linux(R) kernel seems to use task -> group " "information in some places and task information elsewhere and it is really " "not very consistent and looks error-prone." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:901 msgid "" "Every NPTL thread is created by a call to the `clone` syscall with a " "specific set of flags (more in the next subsection). The NPTL implements " "strict 1:1 threading." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:903 msgid "" "In FreeBSD we emulate NPTL threads with ordinary FreeBSD processes that " "share VM space, etc. and the PID gymnastic is just mimicked in the emulation " "specific structure attached to the process. The structure attached to the " "process looks like:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:908 #, no-wrap msgid "" "struct linux_emuldata {\n" " pid_t pid;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:911 #, no-wrap msgid "" " int *child_set_tid; /* in clone(): Child.s TID to set on clone */\n" " int *child_clear_tid;/* in clone(): Child.s TID to clear on exit */\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:913 #, no-wrap msgid " struct linux_emuldata_shared *shared;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:915 #, no-wrap msgid " int pdeath_signal; /* parent death signal */\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:918 #, no-wrap msgid "" " LIST_ENTRY(linux_emuldata) threads; /* list of linux threads */\n" "};\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:925 msgid "" "The PID is used to identify the FreeBSD process that attaches this " "structure. The `child_se_tid` and `child_clear_tid` are used for TID " "address copyout when a process exits and is created. The `shared` pointer " "points to a structure shared among threads. The `pdeath_signal` variable " "identifies the parent death signal and the `threads` pointer is used to link " "this structure to the list of threads. The `linux_emuldata_shared` " "structure looks like:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:929 #, no-wrap msgid "struct linux_emuldata_shared {\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:931 #, no-wrap msgid " int refs;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:933 #, no-wrap msgid " pid_t group_pid;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:936 #, no-wrap msgid "" " LIST_HEAD(, linux_emuldata) threads; /* head of list of linux threads */\n" "};\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:941 msgid "" "The `refs` is a reference counter being used to determine when we can free " "the structure to avoid memory leaks. The `group_pid` is to identify PID ( = " "TGID) of the whole process ( = thread group). The `threads` pointer is the " "head of the list of threads in the process." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:944 msgid "" "The `linux_emuldata` structure can be obtained from the process using " "`em_find`. The prototype of the function is:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:948 #, no-wrap msgid "struct linux_emuldata *em_find(struct proc *, int locked);\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:953 msgid "" "Here, `proc` is the process we want the emuldata structure from and the " "locked parameter determines whether we want to lock or not. The accepted " "values are `EMUL_DOLOCK` and `EMUL_DOUNLOCK`. More about locking later." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:955 #, no-wrap msgid "PID mangling" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:962 msgid "" "As there is a difference in view as what to the idea of a process ID and " "thread ID is between FreeBSD and Linux(R) we have to translate the view " "somehow. We do it by PID mangling. This means that we fake what a PID " "(=TGID) and TID (=PID) is between kernel and userland. The rule of thumb is " "that in kernel (in Linuxulator) PID = PID and TGID = shared -> group pid and " "to userland we present `PID = shared -> group_pid` and `TID = proc -> " "p_pid`. The PID member of `linux_emuldata structure` is a FreeBSD PID." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:966 msgid "" "The above affects mainly getpid, getppid, gettid syscalls. Where we use PID/" "TGID respectively. In copyout of TIDs in `child_clear_tid` and " "`child_set_tid` we copy out FreeBSD PID." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:968 #, no-wrap msgid "Clone syscall" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:972 msgid "" "The `clone` syscall is the way threads are created in Linux(R). The syscall " "prototype looks like this:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:977 #, no-wrap msgid "" "int linux_clone(l_int flags, void *stack, void *parent_tidptr, int dummy,\n" "void * child_tidptr);\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:987 msgid "" "The `flags` parameter tells the syscall how exactly the processes should be " "cloned. As described above, Linux(R) can create processes sharing various " "things independently, for example two processes can share file descriptors " "but not VM, etc. Last byte of the `flags` parameter is the exit signal of " "the newly created process. The `stack` parameter if non-`NULL` tells, where " "the thread stack is and if it is `NULL` we are supposed to copy-on-write the " "calling process stack (i.e. do what normal man:fork[2] routine does). The " "`parent_tidptr` parameter is used as an address for copying out process PID " "(i.e. thread id) once the process is sufficiently instantiated but is not " "runnable yet. The `dummy` parameter is here because of the very strange " "calling convention of this syscall on i386. It uses the registers directly " "and does not let the compiler do it what results in the need of a dummy " "syscall. The `child_tidptr` parameter is used as an address for copying out " "PID once the process has finished forking and when the process exits." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1001 msgid "" "The syscall itself proceeds by setting corresponding flags depending on the " "flags passed in. For example, `CLONE_VM` maps to RFMEM (sharing of VM), " "etc. The only nit here is `CLONE_FS` and `CLONE_FILES` because FreeBSD does " "not allow setting this separately so we fake it by not setting RFFDG " "(copying of fd table and other fs information) if either of these is " "defined. This does not cause any problems, because those flags are always " "set together. After setting the flags the process is forked using the " "internal `fork1` routine, the process is instrumented not to be put on a run " "queue, i.e. not to be set runnable. After the forking is done we possibly " "reparent the newly created process to emulate `CLONE_PARENT` semantics. " "Next part is creating the emulation data. Threads in Linux(R) does not " "signal their parents so we set exit signal to be 0 to disable this. After " "that setting of `child_set_tid` and `child_clear_tid` is performed enabling " "the functionality later in the code. At this point we copy out the PID to " "the address specified by `parent_tidptr`. The setting of process stack is " "done by simply rewriting thread frame `%esp` register (`%rsp` on amd64). " "Next part is setting up TLS for the newly created process. After this man:" "vfork[2] semantics might be emulated and finally the newly created process " "is put on a run queue and copying out its PID to the parent process via " "`clone` return value is done." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1004 msgid "" "The `clone` syscall is able and in fact is used for emulating classic man:" "fork[2] and man:vfork[2] syscalls. Newer glibc in a case of 2.6 kernel uses " "`clone` to implement man:fork[2] and man:vfork[2] syscalls." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1006 #, no-wrap msgid "Locking" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1012 msgid "" "The locking is implemented to be per-subsystem because we do not expect a " "lot of contention on these. There are two locks: `emul_lock` used to " "protect manipulating of `linux_emuldata` and `emul_shared_lock` used to " "manipulate `linux_emuldata_shared`. The `emul_lock` is a nonsleepable " "blocking mutex while `emul_shared_lock` is a sleepable blocking `sx_lock`. " "Due to of the per-subsystem locking we can coalesce some locks and that is " "why the em find offers the non-locking access." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1014 #, no-wrap msgid "TLS" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1017 msgid "This section deals with TLS also known as thread local storage." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1019 #, no-wrap msgid "Introduction to threading" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1040 msgid "" "Threads in computer science are entities within a process that can be " "scheduled independently from each other. The threads in the process share " "process wide data (file descriptors, etc.) but also have their own stack for " "their own data. Sometimes there is a need for process-wide data specific to " "a given thread. Imagine a name of the thread in execution or something like " "that. The traditional UNIX(R) threading API, pthreads provides a way to do " "it via man:pthread_key_create[3], man:pthread_setspecific[3] and man:" "pthread_getspecific[3] where a thread can create a key to the thread local " "data and using man:pthread_getspecific[3] or man:pthread_getspecific[3] to " "manipulate those data. You can easily see that this is not the most " "comfortable way this could be accomplished. So various producers of C/C++ " "compilers introduced a better way. They defined a new modifier keyword " "thread that specifies that a variable is thread specific. A new method of " "accessing such variables was developed as well (at least on i386). The " "pthreads method tends to be implemented in userspace as a trivial lookup " "table. The performance of such a solution is not very good. So the new " "method uses (on i386) segment registers to address a segment, where TLS area " "is stored so the actual accessing of a thread variable is just appending the " "segment register to the address thus addressing via it. The segment " "registers are usually `%gs` and `%fs` acting like segment selectors. Every " "thread has its own area where the thread local data are stored and the " "segment must be loaded on every context switch. This method is very fast " "and used almost exclusively in the whole i386 UNIX(R) world. Both FreeBSD " "and Linux(R) implement this approach and it yields very good results. The " "only drawback is the need to reload the segment on every context switch " "which can slowdown context switches. FreeBSD tries to avoid this overhead " "by using only 1 segment descriptor for this while Linux(R) uses 3. " "Interesting thing is that almost nothing uses more than 1 descriptor (only " "Wine seems to use 2) so Linux(R) pays this unnecessary price for context " "switches." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1042 #, no-wrap msgid "Segments on i386" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1049 msgid "" "The i386 architecture implements the so called segments. A segment is a " "description of an area of memory. The base address (bottom) of the memory " "area, the end of it (ceiling), type, protection, etc. The memory described " "by a segment can be accessed using segment selector registers (`%cs`, `%ds`, " "`%ss`, `%es`, `%fs`, `%gs`). For example let us suppose we have a segment " "which base address is 0x1234 and length and this code:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1053 #, no-wrap msgid "mov %edx,%gs:0x10\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1062 msgid "" "This will load the content of the `%edx` register into memory location " "0x1244. Some segment registers have a special use, for example `%cs` is " "used for code segment and `%ss` is used for stack segment but `%fs` and " "`%gs` are generally unused. Segments are either stored in a global GDT " "table or in a local LDT table. LDT is accessed via an entry in the GDT. " "The LDT can store more types of segments. LDT can be per process. Both " "tables define up to 8191 entries." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1064 #, no-wrap msgid "Implementation on Linux(R) i386" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1072 msgid "" "There are two main ways of setting up TLS in Linux(R). It can be set when " "cloning a process using the `clone` syscall or it can call " "`set_thread_area`. When a process passes `CLONE_SETTLS` flag to `clone`, " "the kernel expects the memory pointed to by the `%esi` register a Linux(R) " "user space representation of a segment, which gets translated to the machine " "representation of a segment and loaded into a GDT slot. The GDT slot can be " "specified with a number or -1 can be used meaning that the system itself " "should choose the first free slot. In practice, the vast majority of " "programs use only one TLS entry and does not care about the number of the " "entry. We exploit this in the emulation and in fact depend on it." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1074 #, no-wrap msgid "Emulation of Linux(R) TLS" msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1077 #, no-wrap msgid "i386" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1095 msgid "" "Loading of TLS for the current thread happens by calling `set_thread_area` " "while loading TLS for a second process in `clone` is done in the separate " "block in `clone`. Those two functions are very similar. The only " "difference being the actual loading of the GDT segment, which happens on the " "next context switch for the newly created process while `set_thread_area` " "must load this directly. The code basically does this. It copies the " "Linux(R) form segment descriptor from the userland. The code checks for the " "number of the descriptor but because this differs between FreeBSD and " "Linux(R) we fake it a little. We only support indexes of 6, 3 and -1. The " "6 is genuine Linux(R) number, 3 is genuine FreeBSD one and -1 means " "autoselection. Then we set the descriptor number to constant 3 and copy out " "this to the userspace. We rely on the userspace process using the number " "from the descriptor but this works most of the time (have never seen a case " "where this did not work) as the userspace process typically passes in 1. " "Then we convert the descriptor from the Linux(R) form to a machine dependant " "form (i.e. operating system independent form) and copy this to the FreeBSD " "defined segment descriptor. Finally we can load it. We assign the " "descriptor to threads PCB (process control block) and load the `%gs` segment " "using `load_gs`. This loading must be done in a critical section so that " "nothing can interrupt us. The `CLONE_SETTLS` case works exactly like this " "just the loading using `load_gs` is not performed. The segment used for " "this (segment number 3) is shared for this use between FreeBSD processes and " "Linux(R) processes so the Linux(R) emulation layer does not add any overhead " "over plain FreeBSD." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1097 #, no-wrap msgid "amd64" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1101 msgid "" "The amd64 implementation is similar to the i386 one but there was initially " "no 32bit segment descriptor used for this purpose (hence not even native " "32bit TLS users worked) so we had to add such a segment and implement its " "loading on every context switch (when a flag signaling use of 32bit is " "set). Apart from this the TLS loading is exactly the same just the segment " "numbers are different and the descriptor format and the loading differs " "slightly." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1103 #, no-wrap msgid "Futexes" msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1106 #, no-wrap msgid "Introduction to synchronization" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1116 msgid "" "Threads need some kind of synchronization and POSIX(R) provides some of " "them: mutexes for mutual exclusion, read-write locks for mutual exclusion " "with biased ratio of reads and writes and condition variables for signaling " "a status change. It is interesting to note that POSIX(R) threading API " "lacks support for semaphores. Those synchronization routines " "implementations are heavily dependant on the type threading support we " "have. In pure 1:M (userspace) model the implementation can be solely done " "in userspace and thus be very fast (the condition variables will probably " "end up being implemented using signals, i.e. not fast) and simple. In 1:1 " "model, the situation is also quite clear - the threads must be synchronized " "using kernel facilities (which is very slow because a syscall must be " "performed). The mixed M:N scenario just combines the first and second " "approach or rely solely on kernel. Threads synchronization is a vital part " "of thread-enabled programming and its performance can affect resulting " "program a lot. Recent benchmarks on FreeBSD operating system showed that an " "improved sx_lock implementation yielded 40% speedup in _ZFS_ (a heavy sx " "user), this is in-kernel stuff but it shows clearly how important the " "performance of synchronization primitives is." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1120 msgid "" "Threaded programs should be written with as little contention on locks as " "possible. Otherwise, instead of doing useful work the thread just waits on " "a lock. As a result of this, the most well written threaded programs show " "little locks contention." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1122 #, no-wrap msgid "Futexes introduction" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1127 msgid "" "Linux(R) implements 1:1 threading, i.e. it has to use in-kernel " "synchronization primitives. As stated earlier, well written threaded " "programs have little lock contention. So a typical sequence could be " "performed as two atomic increase/decrease mutex reference counter, which is " "very fast, as presented by the following example:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1133 #, no-wrap msgid "" "pthread_mutex_lock(&mutex);\n" "...\n" "pthread_mutex_unlock(&mutex);\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1136 msgid "" "1:1 threading forces us to perform two syscalls for those mutex calls, which " "is very slow." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1141 msgid "" "The solution Linux(R) 2.6 implements is called futexes. Futexes implement " "the check for contention in userspace and call kernel primitives only in a " "case of contention. Thus the typical case takes place without any kernel " "intervention. This yields reasonably fast and flexible synchronization " "primitives implementation." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1143 #, no-wrap msgid "Futex API" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1146 msgid "The futex syscall looks like this:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1150 #, no-wrap msgid "int futex(void *uaddr, int op, int val, struct timespec *timeout, void *uaddr2, int val3);\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1153 msgid "" "In this example `uaddr` is an address of the mutex in userspace, `op` is an " "operation we are about to perform and the other parameters have per-" "operation meaning." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1155 msgid "Futexes implement the following operations:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1157 msgid "`FUTEX_WAIT`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1158 msgid "`FUTEX_WAKE`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1159 msgid "`FUTEX_FD`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1160 msgid "`FUTEX_REQUEUE`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1161 msgid "`FUTEX_CMP_REQUEUE`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1162 msgid "`FUTEX_WAKE_OP`" msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1164 #, no-wrap msgid "FUTEX_WAIT" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1169 msgid "" "This operation verifies that on address `uaddr` the value `val` is written. " "If not, `EWOULDBLOCK` is returned, otherwise the thread is queued on the " "futex and gets suspended. If the argument `timeout` is non-zero it " "specifies the maximum time for the sleeping, otherwise the sleeping is " "infinite." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1171 #, no-wrap msgid "FUTEX_WAKE" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1174 msgid "" "This operation takes a futex at `uaddr` and wakes up `val` first futexes " "queued on this futex." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1176 #, no-wrap msgid "FUTEX_FD" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1179 msgid "This operations associates a file descriptor with a given futex." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1181 #, no-wrap msgid "FUTEX_REQUEUE" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1184 msgid "" "This operation takes `val` threads queued on futex at `uaddr`, wakes them " "up, and takes `val2` next threads and requeues them on futex at `uaddr2`." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1186 #, no-wrap msgid "FUTEX_CMP_REQUEUE" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1189 msgid "" "This operation does the same as `FUTEX_REQUEUE` but it checks that `val3` " "equals to `val` first." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1191 #, no-wrap msgid "FUTEX_WAKE_OP" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1195 msgid "" "This operation performs an atomic operation on `val3` (which contains coded " "some other value) and `uaddr`. Then it wakes up `val` threads on futex at " "`uaddr` and if the atomic operation returned a positive number it wakes up " "`val2` threads on futex at `uaddr2`." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1197 msgid "The operations implemented in `FUTEX_WAKE_OP`:" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1199 msgid "`FUTEX_OP_SET`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1200 msgid "`FUTEX_OP_ADD`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1201 msgid "`FUTEX_OP_OR`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1202 msgid "`FUTEX_OP_AND`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1203 msgid "`FUTEX_OP_XOR`" msgstr "" #. type: delimited block = 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1208 msgid "" "There is no `val2` parameter in the futex prototype. The `val2` is taken " "from the `struct timespec *timeout` parameter for operations " "`FUTEX_REQUEUE`, `FUTEX_CMP_REQUEUE` and `FUTEX_WAKE_OP`." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1211 #, no-wrap msgid "Futex emulation in FreeBSD" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1216 msgid "" "The futex emulation in FreeBSD is taken from NetBSD and further extended by " "us. It is placed in `linux_futex.c` and [.filename]#linux_futex.h# files. " "The `futex` structure looks like:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1222 #, no-wrap msgid "" "struct futex {\n" " void *f_uaddr;\n" " int f_refcount;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1224 #, no-wrap msgid " LIST_ENTRY(futex) f_list;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1227 #, no-wrap msgid "" " TAILQ_HEAD(lf_waiting_paroc, waiting_proc) f_waiting_proc;\n" "};\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1230 msgid "And the structure `waiting_proc` is:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1234 #, no-wrap msgid "struct waiting_proc {\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1236 #, no-wrap msgid " struct thread *wp_t;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1238 #, no-wrap msgid " struct futex *wp_new_futex;\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1241 #, no-wrap msgid "" " TAILQ_ENTRY(waiting_proc) wp_list;\n" "};\n" msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1244 #, no-wrap msgid "futex_get / futex_put" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1248 msgid "" "A futex is obtained using the `futex_get` function, which searches a linear " "list of futexes and returns the found one or creates a new futex. When " "releasing a futex from the use we call the `futex_put` function, which " "decreases a reference counter of the futex and if the refcount reaches zero " "it is released." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1250 #, no-wrap msgid "futex_sleep" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1258 msgid "" "When a futex queues a thread for sleeping it creates a `working_proc` " "structure and puts this structure to the list inside the futex structure " "then it just performs a man:tsleep[9] to suspend the thread. The sleep can " "be timed out. After man:tsleep[9] returns (the thread was woken up or it " "timed out) the `working_proc` structure is removed from the list and is " "destroyed. All this is done in the `futex_sleep` function. If we got woken " "up from `futex_wake` we have `wp_new_futex` set so we sleep on it. This way " "the actual requeueing is done in this function." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1260 #, no-wrap msgid "futex_wake" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1268 msgid "" "Waking up a thread sleeping on a futex is performed in the `futex_wake` " "function. First in this function we mimic the strange Linux(R) behavior, " "where it wakes up N threads for all operations, the only exception is that " "the REQUEUE operations are performed on N+1 threads. But this usually does " "not make any difference as we are waking up all threads. Next in the " "function in the loop we wake up n threads, after this we check if there is a " "new futex for requeueing. If so, we requeue up to n2 threads on the new " "futex. This cooperates with `futex_sleep`." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1270 #, no-wrap msgid "futex_wake_op" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1275 msgid "" "The `FUTEX_WAKE_OP` operation is quite complicated. First we obtain two " "futexes at addresses `uaddr` and `uaddr2` then we perform the atomic " "operation using `val3` and `uaddr2`. Then `val` waiters on the first futex " "is woken up and if the atomic operation condition holds we wake up `val2` (i." "e. `timeout`) waiter on the second futex." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1277 #, no-wrap msgid "futex atomic operation" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1282 msgid "" "The atomic operation takes two parameters `encoded_op` and `uaddr`. The " "encoded operation encodes the operation itself, comparing value, operation " "argument, and comparing argument. The pseudocode for the operation is like " "this one:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1287 #, no-wrap msgid "" "oldval = *uaddr2\n" "*uaddr2 = oldval OP oparg\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1291 msgid "" "And this is done atomically. First a copying in of the number at `uaddr` is " "performed and the operation is done. The code handles page faults and if no " "page fault occurs `oldval` is compared to `cmparg` argument with cmp " "comparator." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1293 #, no-wrap msgid "Futex locking" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1297 msgid "" "Futex implementation uses two lock lists protecting `sx_lock` and global " "locks (either Giant or another `sx_lock`). Every operation is performed " "locked from the start to the very end." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1299 #, no-wrap msgid "Various syscalls implementation" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1302 msgid "" "In this section I am going to describe some smaller syscalls that are worth " "mentioning because their implementation is not obvious or those syscalls are " "interesting from other point of view." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1304 #, no-wrap msgid "*at family of syscalls" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1313 msgid "" "During development of Linux(R) 2.6.16 kernel, the *at syscalls were added. " "Those syscalls (`openat` for example) work exactly like their at-less " "counterparts with the slight exception of the `dirfd` parameter. This " "parameter changes where the given file, on which the syscall is to be " "performed, is. When the `filename` parameter is absolute `dirfd` is ignored " "but when the path to the file is relative, it comes to the play. The " "`dirfd` parameter is a directory relative to which the relative pathname is " "checked. The `dirfd` parameter is a file descriptor of some directory or " "`AT_FDCWD`. So for example the `openat` syscall can be like this:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1317 #, no-wrap msgid "file descriptor 123 = /tmp/foo/, current working directory = /tmp/\n" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1322 #, no-wrap msgid "" "openat(123, /tmp/bah\\, flags, mode)\t/* opens /tmp/bah */\n" "openat(123, bah\\, flags, mode)\t\t/* opens /tmp/foo/bah */\n" "openat(AT_FDWCWD, bah\\, flags, mode)\t/* opens /tmp/bah */\n" "openat(stdio, bah\\, flags, mode)\t/* returns error because stdio is not a directory */\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1331 msgid "" "This infrastructure is necessary to avoid races when opening files outside " "the working directory. Imagine that a process consists of two threads, " "thread A and thread B. Thread A issues `open(./tmp/foo/bah., flags, mode)` " "and before returning it gets preempted and thread B runs. Thread B does not " "care about the needs of thread A and renames or removes [.filename]#/tmp/foo/" "#. We got a race. To avoid this we can open [.filename]#/tmp/foo# and use " "it as `dirfd` for `openat` syscall. This also enables user to implement per-" "thread working directories." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1334 msgid "" "Linux(R) family of *at syscalls contains: `linux_openat`, `linux_mkdirat`, " "`linux_mknodat`, `linux_fchownat`, `linux_futimesat`, `linux_fstatat64`, " "`linux_unlinkat`, `linux_renameat`, `linux_linkat`, `linux_symlinkat`, " "`linux_readlinkat`, `linux_fchmodat` and `linux_faccessat`. All these are " "implemented using the modified man:namei[9] routine and simple wrapping " "layer." msgstr "" #. type: Title ===== #: documentation/content/en/articles/linux-emulation/_index.adoc:1336 #, no-wrap msgid "Implementation" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1344 msgid "" "The implementation is done by altering the man:namei[9] routine (described " "above) to take additional parameter `dirfd` in its `nameidata` structure, " "which specifies the starting point of the pathname lookup instead of using " "the current working directory every time. The resolution of `dirfd` from " "file descriptor number to a vnode is done in native *at syscalls. When " "`dirfd` is `AT_FDCWD` the `dvp` entry in `nameidata` structure is `NULL` but " "when `dirfd` is a different number we obtain a file for this file " "descriptor, check whether this file is valid and if there is vnode attached " "to it then we get a vnode. Then we check this vnode for being a directory. " "In the actual man:namei[9] routine we simply substitute the `dvp` vnode for " "`dp` variable in the man:namei[9] function, which determines the starting " "point. The man:namei[9] is not used directly but via a trace of different " "functions on various levels. For example the `openat` goes like this:" msgstr "" #. type: delimited block . 4 #: documentation/content/en/articles/linux-emulation/_index.adoc:1348 #, no-wrap msgid "openat() --> kern_openat() --> vn_open() -> namei()\n" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1354 msgid "" "For this reason `kern_open` and `vn_open` must be altered to incorporate the " "additional `dirfd` parameter. No compat layer is created for those because " "there are not many users of this and the users can be easily converted. " "This general implementation enables FreeBSD to implement their own *at " "syscalls. This is being discussed right now." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1356 #, no-wrap msgid "Ioctl" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1370 msgid "" "The ioctl interface is quite fragile due to its generality. We have to bear " "in mind that devices differ between Linux(R) and FreeBSD so some care must " "be applied to do ioctl emulation work right. The ioctl handling is " "implemented in [.filename]#linux_ioctl.c#, where `linux_ioctl` function is " "defined. This function simply iterates over sets of ioctl handlers to find " "a handler that implements a given command. The ioctl syscall has three " "parameters, the file descriptor, command and an argument. The command is a " "16-bit number, which in theory is divided into high 8 bits determining class " "of the ioctl command and low 8 bits, which are the actual command within the " "given set. The emulation takes advantage of this division. We implement " "handlers for each set, like `sound_handler` or `disk_handler`. Each handler " "has a maximum command and a minimum command defined, which is used for " "determining what handler is used. There are slight problems with this " "approach because Linux(R) does not use the set division consistently so " "sometimes ioctls for a different set are inside a set they should not belong " "to (SCSI generic ioctls inside cdrom set, etc.). FreeBSD currently does not " "implement many Linux(R) ioctls (compared to NetBSD, for example) but the " "plan is to port those from NetBSD. The trend is to use Linux(R) ioctls even " "in the native FreeBSD drivers because of the easy porting of applications." msgstr "" #. type: Title ==== #: documentation/content/en/articles/linux-emulation/_index.adoc:1372 #, no-wrap msgid "Debugging" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1379 msgid "" "Every syscall should be debuggable. For this purpose we introduce a small " "infrastructure. We have the ldebug facility, which tells whether a given " "syscall should be debugged (settable via a sysctl). For printing we have " "LMSG and ARGS macros. Those are used for altering a printable string for " "uniform debugging messages." msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:1381 #, no-wrap msgid "Conclusion" msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1384 #, no-wrap msgid "Results" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1390 msgid "" "As of April 2007 the Linux(R) emulation layer is capable of emulating the " "Linux(R) 2.6.16 kernel quite well. The remaining problems concern futexes, " "unfinished *at family of syscalls, problematic signals delivery, missing " "`epoll` and `inotify` and probably some bugs we have not discovered yet. " "Despite this we are capable of running basically all the Linux(R) programs " "included in FreeBSD Ports Collection with Fedora Core 4 at 2.6.16 and there " "are some rudimentary reports of success with Fedora Core 6 at 2.6.16. The " "Fedora Core 6 linux_base was recently committed enabling some further " "testing of the emulation layer and giving us some more hints where we should " "put our effort in implementing missing stuff." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1394 msgid "" "We are able to run the most used applications like package:www/linux-" "firefox[], package:net-im/skype[] and some games from the Ports Collection. " "Some of the programs exhibit bad behavior under 2.6 emulation but this is " "currently under investigation and hopefully will be fixed soon. The only " "big application that is known not to work is the Linux(R) Java(TM) " "Development Kit and this is because of the requirement of `epoll` facility " "which is not directly related to the Linux(R) kernel 2.6." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1397 msgid "" "We hope to enable 2.6.16 emulation by default some time after FreeBSD 7.0 is " "released at least to expose the 2.6 emulation parts for some wider testing. " "Once this is done we can switch to Fedora Core 6 linux_base, which is the " "ultimate plan." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1399 #, no-wrap msgid "Future work" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1402 msgid "" "Future work should focus on fixing the remaining issues with futexes, " "implement the rest of the *at family of syscalls, fix the signal delivery " "and possibly implement the `epoll` and `inotify` facilities." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1404 msgid "" "We hope to be able to run the most important programs flawlessly soon, so we " "will be able to switch to the 2.6 emulation by default and make the Fedora " "Core 6 the default linux_base because our currently used Fedora Core 4 is " "not supported any more." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1408 msgid "" "The other possible goal is to share our code with NetBSD and DragonflyBSD. " "NetBSD has some support for 2.6 emulation but its far from finished and not " "really tested. DragonflyBSD has expressed some interest in porting the 2.6 " "improvements." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1413 msgid "" "Generally, as Linux(R) develops we would like to keep up with their " "development, implementing newly added syscalls. Splice comes to mind " "first. Some already implemented syscalls are also suboptimal, for example " "`mremap` and others. Some performance improvements can also be made, finer " "grained locking and others." msgstr "" #. type: Title === #: documentation/content/en/articles/linux-emulation/_index.adoc:1415 #, no-wrap msgid "Team" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1418 msgid "I cooperated on this project with (in alphabetical order):" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1420 msgid "`{jhb}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1421 msgid "`{kib}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1422 msgid "Emmanuel Dreyfus" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1423 msgid "Scot Hetzel" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1424 msgid "`{jkim}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1425 msgid "`{netchild}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1426 msgid "`{ssouhlal}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1427 msgid "Li Xiao" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1428 msgid "`{davidxu}`" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1430 msgid "" "I would like to thank all those people for their advice, code reviews and " "general support." msgstr "" #. type: Title == #: documentation/content/en/articles/linux-emulation/_index.adoc:1432 #, no-wrap msgid "Literatures" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1435 msgid "" "Marshall Kirk McKusick - George V. Nevile-Neil. Design and Implementation of " "the FreeBSD operating system. Addison-Wesley, 2005." msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1436 msgid "https://tldp.org[https://tldp.org]" msgstr "" #. type: Plain text #: documentation/content/en/articles/linux-emulation/_index.adoc:1436 msgid "https://www.kernel.org[https://www.kernel.org]" msgstr ""