aboutsummaryrefslogtreecommitdiff
path: root/sys/compat/linux
Commit message (Collapse)AuthorAgeFilesLines
* Place hostnames and similar information fully under the prison system.Jamie Gritton2009-05-291-5/+5
| | | | | | | | | | | | | | | | | | | | The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor) Notes: svn path=/head/; revision=193066
* linux_ioctl_cdrom: reduce stack usageAndriy Gapon2009-05-271-11/+16
| | | | | | | | | | | | | | ... by moving two ~2KB structures from stack to heap allocation. I experienced stack overflow in linux emulation on i386 (8K stack) when LINUX_DVD_READ_STRUCT ioctl was performed on atapicam cd device and there was an error that resulted in additional quite heavy stack use in cam layer. Reviewed by: dchagin Approved by: jhb (mentor) Notes: svn path=/head/; revision=192899
* Add hierarchical jails. A jail may further virtualize its environmentJamie Gritton2009-05-271-140/+92
| | | | | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor) Notes: svn path=/head/; revision=192895
* Validate user-supplied arguments values.Dmitry Chagin2009-05-191-1/+28
| | | | | | | | | | | | Args argument is a pointer to the structure located in user space in which the socketcall arguments are packed. The structure must be copied to the kernel instead of direct dereferencing. Approved by: kib (mentor) MFC after: 1 week Notes: svn path=/head/; revision=192373
* Implement MSG_CMSG_CLOEXEC flag for linux_recvmsg().Dmitry Chagin2009-05-182-9/+25
| | | | | | | | Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=192284
* Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC andDmitry Chagin2009-05-162-2/+30
| | | | | | | | | | | | | SOCK_NONBLOCK flags, that allow to save fcntl() calls. Implement a variation of the socket() syscall which takes a flags in addition to the type argument. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=192206
* Return EINVAL in case when the incorrect or unsupportedDmitry Chagin2009-05-162-0/+12
| | | | | | | | | | | | type argument is specified. Do not map type argument value as its Linux values are identical to FreeBSD values. Approved by: kib (mentor) Notes: svn path=/head/; revision=192205
* Use the protocol family constants for the domain argument validation.Dmitry Chagin2009-05-161-3/+5
| | | | | | | | | | Return immediately when the socket() failed. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=192204
* Emulate SO_PEERCRED socket option.Dmitry Chagin2009-05-162-1/+26
| | | | | | | | | | | | | Temporarily use 0 for pid member as the FreeBSD does not cache remote UNIX domain socket peer pid. PR: kern/102956 Reviewed by: rwatson Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=192203
* Translate l_timeval arg to native struct timeval inDmitry Chagin2009-05-111-0/+40
| | | | | | | | | | | | | | | linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO, SO_SNDTIMEO opts as l_timeval has MD members. Remove bogus __packed attribute from l_timeval struct on __amd64__. PR: kern/134276 Submitted by: Thomas Mueller <tmueller sysgo com> Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=191989
* Add forgotten linux to bsd flags argument mapping into the linux_recv().Dmitry Chagin2009-05-111-1/+1
| | | | | | | | | | PR: kern/134276 Submitted by: Thomas Mueller <tmueller sysgo com> Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=191988
* Do not export AT_CLKTCK when emulating Linux kernel priorDmitry Chagin2009-05-102-1/+14
| | | | | | | | | | | | | | | | | to 2.4.0, as it has appeared in the 2.4.0-rc7 first time. Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK), glibc falls back to the hard-coded CLK_TCK value when aux entry is not present. Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value. For older applications/libc's which depends on hard-coded CLK_TCK value user should set compat.linux.osrelease less than 2.4.0. Approved by: kib (mentor) Notes: svn path=/head/; revision=191973
* Introduce linux_kernver() interface which is intended for an exactDmitry Chagin2009-05-102-17/+62
| | | | | | | | | | | | designation of the emulated kernel version. linux_kernver() returns integer value formatted as 'VVVMMMIII' where VVV - version, MMM - major revision, III - minor revision. Approved by: kib (mentor) Notes: svn path=/head/; revision=191972
* Rework r189362, r191883.Dmitry Chagin2009-05-102-1/+5
| | | | | | | | | | | | The frequency of the statistics clock is given by stathz. Use stathz if it is available, otherwise use hz. Pointed out by: bde Approved by: kib (mentor) Notes: svn path=/head/; revision=191966
* Give vfs_getopt the type it's expecting.Jamie Gritton2009-05-071-4/+2
| | | | | | | | | Write 100 times: "32 bits is so twentieth century." Noticed by: dchagin Notes: svn path=/head/; revision=191898
* Move the per-prison Linux MIB from a private one-off pointer to the newJamie Gritton2009-05-073-96/+326
| | | | | | | | | | | | OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor) Notes: svn path=/head/; revision=191896
* Add KTR(9) tracing for futex emulation.Dmitry Chagin2009-05-071-11/+49
| | | | | | | | Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191887
* Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry,Dmitry Chagin2009-05-071-3/+1
| | | | | | | | | | | | | | | | which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is not exported, Glibc uses 100. linux_times() shall use the value that is exported to user space. Pointyhat to: dchagin PR: kern/134251 Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=191883
* Change linux struct tms definition to match actual linux one.Dmitry Chagin2009-05-071-4/+4
| | | | | | | | Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=191880
* Add preliminary KTR(9) support to the linux emulation layer.Dmitry Chagin2009-05-072-2/+31
| | | | | | | | Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191877
* To avoid excessive code duplication move MI definitions to the MIDmitry Chagin2009-05-072-0/+11
| | | | | | | | | | header file. As it is defined in Linux. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191876
* Return EAFNOSUPPORT instead of EINVAL in case when the incorrect orDmitry Chagin2009-05-071-1/+1
| | | | | | | | | unsupported domain argument is specified. Approved by: kib (mentor) Notes: svn path=/head/; revision=191875
* Rework r191742.Dmitry Chagin2009-05-071-5/+12
| | | | | | | | | | | | | | | | | | Use the protocol family constants for the domain argument validation. Return EAFNOSUPPORT in case when the incorrect domain argument is specified. Return EPROTONOSUPPORT instead of passing values that are not 0 to the BSD layer. Suggested by: rwatson Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191871
* Mark Linux MIB sysctls MPSAFE.Jamie Gritton2009-05-041-3/+3
| | | | | | | | Reviewed by: dchagin, kib Approved by: bz (mentor) Notes: svn path=/head/; revision=191792
* Linux socketpair() call expects explicit specified protocol forDmitry Chagin2009-05-021-1/+4
| | | | | | | | | | AF_LOCAL domain unlike FreeBSD which expects 0 in this case. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191742
* Move extern variable definitions to the header file.Dmitry Chagin2009-05-022-1/+4
| | | | | | | | Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191741
* Reimplement futexes.Dmitry Chagin2009-05-011-355/+446
| | | | | | | | | | | | | | | | | | | | | | | Old implemention used Giant to protect the kernel data structures, but at the same time called malloc(M_WAITOK), that could cause the calling thread to sleep and lost Giant protection. User-visible result was the missed wakeup. New implementation uses one sx lock per futex. The sx protects the futex structures and allows to sleep while copyin or copyout are performed. Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation is requested and either caller specified futexes are equial or second futex already exists. This is acceptable since the situation can only occur from the application error, and glibc falls back to old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191719
* In preparation for turning on options VIMAGE in next commits,Marko Zec2009-04-261-0/+2
| | | | | | | | | | | rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor) Notes: svn path=/head/; revision=191548
* Remove support for FUTEX_REQUEUE operation.Dmitry Chagin2009-04-193-13/+20
| | | | | | | | | | | | | | | | | | | Glibc does not use this operation since 2.3.3 version (Jun 2004), as it is racy and replaced by FUTEX_CMP_REQUEUE operation. Glibc versions prior to 2.3.3 fall back to FUTEX_WAKE when FUTEX_REQUEUE returned EINVAL. Any application directly using FUTEX_REQUEUE without return value checking are definitely broken. Limit quantity of messages per process about unsupported operation. Approved by: kib (mentor) MFC after: 1 month Notes: svn path=/head/; revision=191269
* Add stuff to support upcoming BMC/IPMI flashing of newer Dell machineDoug Ambrisko2009-03-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | via the Linux tool. - Add Linux shim to ipmi(4) - Create a partitions file to linprocfs to make Linux fdisk see disks. This file is dynamic so we can see disks come and go. - Convert msdosfs to vfat in mtab since Linux uses that for msdosfs. - In the Linux mount path convert vfat passed in to msdosfs so Linux mount works on FreeBSD. Note that tasting works so that if da0 is a msdos file system /compat/linux/bin/mount /dev/da0 /mnt works. - fix a 64it bug for l_off_t. Grabing sh, mount, fdisk, df from Linux, creating a symlink of mtab to /compat/linux/etc/mtab and then some careful unpacking of the Linux bmc update tool and hacking makes it work on newer Dell boxes. Note, probably if you can't figure out how to do this, then you probably shouldn't be doing it :-) Notes: svn path=/head/; revision=190445
* Sort include files in the alphabetical order.Dmitry Chagin2009-03-161-5/+4
| | | | | | | | Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=189867
* Ignore FUTEX_FD op, as it is done by linux.Dmitry Chagin2009-03-152-8/+1
| | | | | | | | Approved by: kib (mentor) MFC after: 2 weeks Notes: svn path=/head/; revision=189862
* Include linux_futex.h before linux_emul.hDmitry Chagin2009-03-152-3/+1
| | | | | | | | Approved by: kib (mentor) MFC after: 6 days Notes: svn path=/head/; revision=189861
* A better fix for handling different FPU initial control words for differentJohn Baldwin2009-03-051-0/+5
| | | | | | | | | | | | | | | | | | | ABIs: - Store the FPU initial control word in the pcb for each thread. - When first using the FPU, load the initial control word after restoring the clean state if it is not the standard control word. - Provide a correct control word for Linux/i386 binaries under FreeBSD/amd64. - Adjust the control word returned for fpugetregs()/npxgetregs() when a thread hasn't used the FPU yet to reflect the real initial control word for the current ABI. - The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control word instead of trashing whatever the current state of the FPU is. Reviewed by: bde Notes: svn path=/head/; revision=189423
* Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries whichDmitry Chagin2009-03-042-31/+16
| | | | | | | | | | | | | | | | | are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?" from some programs at start, among them are top and pkill. Do the assignment of the vector entries in elf_linux_fixup() as it is done in glibc. Fix some minor style issues. Submitted by: Marcin Cieslak <saper at SYSTEM PL> Approved by: kib (mentor) MFC after: 1 week Notes: svn path=/head/; revision=189362
* For all files including net/vnet.h directly include opt_route.h andBjoern A. Zeeb2009-02-271-0/+2
| | | | | | | | | | | | | | | | | net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily. Notes: svn path=/head/; revision=189106
* Don't make Linux stat() open character devices to resolve its name.Ed Schouten2009-02-201-47/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!) Notes: svn path=/head/; revision=188849
* Use shared vnode locks when invoking VOP_READDIR().John Baldwin2009-02-132-2/+2
| | | | | | | MFC after: 1 month Notes: svn path=/head/; revision=188588
* Fix an edge-case of the linux readdir: We need the size of a linux direntAlexander Leidinger2009-02-131-1/+1
| | | | | | | | | | | structure, not the size of a pointer to it. PR: 131099 Submitted by: Andreas Kies <andikies@gmail.com> MFC after: 2 weeks Notes: svn path=/head/; revision=188572
* Last step of splitting up minor and unit numbers: remove minor().Ed Schouten2009-01-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062. Notes: svn path=/head/; revision=187830
* Push down Giant inside sysctl. Also add some more assertions to the code.Ed Schouten2008-12-291-12/+4
| | | | | | | | | | | | | | | | | In the existing code we didn't really enforce that callers hold Giant before calling userland_sysctl(), even though there is no guarantee it is safe. Fix this by just placing Giant locks around the call to the oid handler. This also means we only pick up Giant for a very short period of time. Maybe we should add MPSAFE flags to sysctl or phase it out all together. I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root() and name2oid() are called with the sysctl lock held. Reviewed by: Jille Timmermans <jille quis cx> Notes: svn path=/head/; revision=186564
* Rather than using hidden includes (with cicular dependencies),Bjoern A. Zeeb2008-12-022-0/+4
| | | | | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=185571
* Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.Konstantin Belousov2008-11-292-52/+269
| | | | | | | | | | | Change types used in the linux' struct msghdr and struct cmsghdr definitions to the properly-sized architecture-specific types. Move ancillary data handler from linux_sendit() to linux_sendmsg(). Submitted by: dchagin Notes: svn path=/head/; revision=185442
* Document that all the other commands are eitherRoman Divacky2008-11-261-0/+16
| | | | | | | | | | | | identical to the FreeBSD ones or rejected by kern_msgctl(). Found with: Coverity Prevent(tm) CID: 3456 Approved by: kib (mentor) Notes: svn path=/head/; revision=185337
* In the robust futexes list head, futex_offset shall be signed,Konstantin Belousov2008-11-161-2/+2
| | | | | | | | | and glibc actually supplies negative offsets. Change l_ulong to l_long. Submitted by: dchagin Notes: svn path=/head/; revision=185002
* Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.Ed Schouten2008-11-091-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib Notes: svn path=/head/; revision=184789
* The code in linux_proc_exit() contains a race when multiple linux basedKonstantin Belousov2008-10-311-3/+3
| | | | | | | | | | | | | | | | processes exits at the same time. The linux_emuldata structure is freed but p->p_emuldata is left as a dangling pointer to the just freed memory. The check for W_EXIT in the loop scanning the child processes isn't safe since the state of the child process can change right afterwards. Lock the process and check the W_EXIT before delivering signal. Submitted by: tegge Reviewed by: davidxu MFC after: 1 week Notes: svn path=/head/; revision=184501
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessaryEdward Tomasz Napierala2008-10-281-3/+3
| | | | | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor) Notes: svn path=/head/; revision=184413
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).Dag-Erling Smørgrav2008-10-233-11/+11
| | | | | | | MFC after: 3 months Notes: svn path=/head/; revision=184205
* Correctly fill siginfo for the signals delivered by linux tkill/tgkill.Konstantin Belousov2008-10-192-24/+92
| | | | | | | | | | | | | | | | | | | | | | | | It is required for async cancellation to work. Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made to not linux process. Do not call em_find(p, ...) with p unlocked. Move common code for linux_tkill() and linux_tgkill() into linux_do_tkill(). Change linux siginfo_t definition to match actual linux one. Extend uid fields to 4 bytes from 2. The extension does not change structure layout and is binary compatible with previous definition, because i386 is little endian, and each uid field has 2 byte padding after it. Reported by: Nicolas Joly <njoly pasteur fr> Submitted by: dchangin MFC after: 1 month Notes: svn path=/head/; revision=184058