summaryrefslogtreecommitdiff
path: root/sys/compat/linux/linux_fork.c
Commit message (Collapse)AuthorAgeFilesLines
* linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILESConrad Meyer2020-11-171-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | The two flags are distinct and it is impossible to correctly handle clone(2) without the assistance of fork1(). This change depends on the pwddesc split introduced in r367777. I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd should be treated the opposite way p_fd is (based on RFFDG flag). This is a little ugly, but the benefit is that existing RFFDG API is preserved. Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are copied, while !RFFDG indicates both should be cloned. In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects independent fd tables. The previous conflation of CLONE_FS and CLONE_FILES was introduced in r163371 (2006). Discussed with: markj, trasz (earlier version) Differential Revision: https://reviews.freebsd.org/D27016 Notes: svn path=/head/; revision=367778
* compat: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365080
* Remove "emulation" of clone(CLONE_PARENT | CLONE_THREAD).Mark Johnston2020-08-171-5/+3
| | | | | | | | | | | On Linux this is supposed to result in EINVAL. Reported by: syzkaller MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=364329
* schedlock 1/4Jeff Roberson2019-12-151-4/+0
| | | | | | | | | | | | | | | Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626 Notes: svn path=/head/; revision=355779
* Linuxulator depends on a fundamental kernel settings such as SMP. ManyDmitry Chagin2019-05-131-41/+2
| | | | | | | | | | | | | | | | | | | | | | of them listed in opt_global.h which is not generated while building modules outside of a kernel and such modules never match real cofigured kernel. So, we should prevent our users from building obviously defective modules. Therefore, remove the root cause of the building of modules outside of a kernel - the possibility of building modules with DEBUG or KTR flags. And remove all of DEBUG printfs as it is incomplete and in threaded programms not informative, also a half of system call does not have DEBUG printf. For debuging Linux programms we have dtrace, ktr and ktrace ability. PR: 222861 Reviewed by: trasz MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20178 Notes: svn path=/head/; revision=347538
* Whitespace cleanup (annoying).Dmitry Chagin2019-03-241-1/+1
| | | | | | | MFC after: 1 month Notes: svn path=/head/; revision=345473
* proc: always store parent pid in p_oppidMateusz Guzik2018-11-161-1/+1
| | | | | | | | | | | | Doing so removes the dependency on proctree lock from sysctl process list export which further reduces contention during poudriere -j 128 runs. Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17825 Notes: svn path=/head/; revision=340482
* linux_clone_thread: mark new thread as TDB_BORN.Konstantin Belousov2018-06-211-0/+4
| | | | | | | | | | | | | So that the ptrace code will catch it and report it to attached debugger. Enables debugging of threaded Linux binaries with FreeBSD debugger. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15880 Notes: svn path=/head/; revision=335505
* linuxulator: do not include legacy syscalls on arm64Ed Maste2018-06-151-0/+2
| | | | | | | | | | | | | | | | | Existing linuxulator platforms (i386, amd64) support legacy syscalls, such as non-*at ones like open, but arm64 and other new platforms do not. Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h files. We may need finer grained control in the future but this is sufficient for now. Reviewed by: andrew Sponsored by: Turing Robotic Industries Differential Revision: https://reviews.freebsd.org/D15237 Notes: svn path=/head/; revision=335201
* Linuxolator whitespace cleanupEd Maste2018-02-051-8/+8
| | | | | | | | | | | | | A version of each of the MD files by necessity exists for each CPU architecture supported by the Linuxolator. Clean these up so that new architectures do not inherit whitespace issues. Clean up shared Linuxolator files while here. Sponsored by: Turing Robotic Industries Inc. Notes: svn path=/head/; revision=328890
* sys/compat: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-271-0/+2
| | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326266
* Update comments for the MD functions managing contexts for newKonstantin Belousov2016-06-161-2/+2
| | | | | | | | | | | | | | | | | | | threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731 Notes: svn path=/head/; revision=301961
* Add implementation of robust mutexes, hopefully close enough to theKonstantin Belousov2016-05-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=300043
* sys/compat/linux*: spelling fixes.Pedro F. Giffuni2016-04-301-1/+1
| | | | | | | | | Mostly on comments but there are some user-visible messages as well. MFC after: 2 weeks Notes: svn path=/head/; revision=298829
* Link the newly created process to the corresponding parent asDmitry Chagin2016-03-081-0/+12
| | | | | | | | | | if CLONE_PARENT is set, then the parent of the new process will be the same as that of the calling process. MFC after: 1 week Notes: svn path=/head/; revision=296501
* fork: pass arguments to fork1 in a dedicated structureMateusz Guzik2016-02-041-5/+15
| | | | | | | Suggested by: kib Notes: svn path=/head/; revision=295232
* Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) ↵Bryan Drewery2015-10-071-28/+5
| | | | | | | | | | | | | | | | | | | | wrappers. r161611 added some of the code from sys_vfork() directly into the Linux module wrappers since they use RFSTOPPED. In r232240, the RFFPWAIT handling was moved to syscallret(), thus this code in the Linux module is no longer needed as it will be called later. This also allows the Linux wrappers to benefit from the fix in r275616 for threads not getting suspended if their vforked child is stopped while they wait on them. Reviewed by: jhb, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3828 Notes: svn path=/head/; revision=288994
* Fixes a panic triggered by threaded Linux applications when runningEdward Tomasz Napierala2015-09-021-1/+21
| | | | | | | | | | | | | with RACCT/RCTL enabled. Reviewed by: ngie@, ed@ Tested by: Larry Rosenman <ler@lerctr.org> MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3470 Notes: svn path=/head/; revision=287395
* Limit rights on process descriptors.Ed Schouten2015-07-311-4/+4
| | | | | | | | | | | | | | | | | | | | | On CloudABI, the rights bits returned by cap_rights_get() match up with the operations that you can actually perform on the file descriptor. Limiting the rights is good, because it makes it easier to get uniform behaviour across different operating systems. If process descriptors on FreeBSD would suddenly gain support for any new file operation, this wouldn't become exposed to CloudABI processes without first extending the rights. Extend fork1() to gain a 'struct filecaps' argument that allows you to construct process descriptors with custom rights. Use this in cloudabi_sys_proc_fork() to limit the rights to just fstat() and pdwait(). Obtained from: https://github.com/NuxiNL/freebsd Notes: svn path=/head/; revision=286122
* The si_status field of the siginfo_t, provided by the waitid(2) andKonstantin Belousov2015-07-181-1/+1
| | | | | | | | | | | | | | | | | | | SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=285670
* linux: make sure to grab all cow structs when creating a threadMateusz Guzik2015-06-101-1/+1
| | | | | | | | | This is a fixup for r284214. Reported and tested by: Ivan Klymenko <fidaj ukr.net> Notes: svn path=/head/; revision=284226
* Rework signal code to allow using it by other modules, like linprocfs:Dmitry Chagin2015-05-241-4/+1
| | | | | | | | | | | | | | | | | | | | 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216 Notes: svn path=/head/; revision=283474
* Improve ktr(9) records in thread managment code.Dmitry Chagin2015-05-241-4/+4
| | | | | | | | Differential Revision: https://reviews.freebsd.org/D1464 Reviewed by: trasz Notes: svn path=/head/; revision=283456
* td_sigmask of a newly created thread copied from td.Dmitry Chagin2015-05-241-1/+0
| | | | | | | | | | Remove excess initialization of td_sigmask. Differential Revision: https://reviews.freebsd.org/D1128 Reviewed by: emaste Notes: svn path=/head/; revision=283450
* Refund the proc emuldata struct for future use. For now move flags fromDmitry Chagin2015-05-241-0/+61
| | | | | | | | | | | | | thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073 Notes: svn path=/head/; revision=283422
* Switch linuxulator to use the native 1:1 threads.Dmitry Chagin2015-05-241-70/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039 Notes: svn path=/head/; revision=283383
* In preparation for switching linuxulator to the use the native 1:1Dmitry Chagin2015-05-241-0/+14
| | | | | | | | | | | | | threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz Notes: svn path=/head/; revision=283370
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingAttilio Rao2013-11-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
* Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.hJohn Baldwin2013-01-291-0/+1
| | | | | | | | | | by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry | gmail MFC after: 2 weeks Notes: svn path=/head/; revision=246085
* - >500 static DTrace probes for the linuxulatorAlexander Leidinger2012-05-051-0/+10
| | | | | | | | | | | | | | | | | | | | | - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs). Notes: svn path=/head/; revision=235063
* Add experimental support for process descriptorsJonathan Anderson2011-08-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc Notes: svn path=/head/; revision=224987
* Do not clobber %rdx.Dmitry Chagin2011-02-201-1/+0
| | | | | | | | Before calling vfork() syscall the linux user-space stores the current PID in the %rdx and restore it when the parent process will leave the kernel. Notes: svn path=/head/; revision=218879
* Slightly rewrite linux_fork:Dmitry Chagin2011-02-121-13/+6
| | | | | | | | | 1) Remove bogus error checking. 2) A new process exit from kernel through fork_trampoline(), so remove bogus check. Notes: svn path=/head/; revision=218618
* Remove bogus include <machine/frame.h>Dmitry Chagin2011-02-121-2/+0
| | | | Notes: svn path=/head/; revision=218617
* Move linux_clone(), linux_fork(), linux_vfork() to a MI path.Dmitry Chagin2011-02-121-0/+297
Notes: svn path=/head/; revision=218616