aboutsummaryrefslogtreecommitdiff
path: root/sys/net/if_ethersubr.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Initial implementation of draft-ietf-6man-ipv6only-flag.Bjoern A. Zeeb2018-10-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change defines the RA "6" (IPv6-Only) flag which routers may advertise, kernel logic to check if all routers on a link have the flag set and accordingly update a per-interface flag. If all routers agree that it is an IPv6-only link, ether_output_frame(), based on the interface flag, will filter out all ETHERTYPE_IP/ARP frames, drop them, and return EAFNOSUPPORT to upper layers. The change also updates ndp to show the "6" flag, ifconfig to display the IPV6_ONLY nd6 flag if set, and rtadvd to allow announcing the flag. Further changes to tcpdump (contrib code) are availble and will be upstreamed. Tested the code (slightly earlier version) with 2 FreeBSD IPv6 routers, a FreeBSD laptop on ethernet as well as wifi, and with Win10 and OSX clients (which did not fall over with the "6" flag set but not understood). We may also want to (a) implement and RX filter, and (b) over time enahnce user space to, say, stop dhclient from running when the interface flag is set. Also we might want to start IPv6 before IPv4 in the future. All the code is hidden under the EXPERIMENTAL option and not compiled by default as the draft is a work-in-progress and we cannot rely on the fact that IANA will assign the bits as requested by the draft and hence they may change. Dear 6man, you have running code. Discussed with: Bob Hinden, Brian E Carpenter Notes: svn path=/head/; revision=339929
* Remove the Yarrow PRNG algorithm option in accordance with due noticeMark Murray2018-08-261-1/+1
| | | | | | | | | | | | | | | | | | | given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898 Notes: svn path=/head/; revision=338324
* Unbreak VLANs after r337943.Navdeep Parhar2018-08-241-1/+2
| | | | | | | | | | | | | | | | | ether_set_pcp should not be called from ether_output_frame for VLAN interfaces -- the vid + pcp will be inserted during vlan_transmit in that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value and this led to double encapsulation (once with vid 0 and second time with vid+pcp). PR: 230794 Reviewed by: kib@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D16887 Notes: svn path=/head/; revision=338305
* Use the new VNET_DEFINE_STATIC macro when we are defining static VNETAndrew Turner2018-07-241-1/+1
| | | | | | | | | | | variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147 Notes: svn path=/head/; revision=336676
* Reduce overhead of entropy collectionMatt Macy2018-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | - move harvest mask check inline - move harvest mask to frequently_read out of actively modified cache line - disable ether_input collection and describe its limitations in NOTES Typically entropy collection in ether_input was stirring zero in to the entropy pool while at the same time greatly reducing max pps. This indicates that perhaps we should more closely scrutinize how much entropy we're getting from a given source as well as what our actual entropy collection needs are for seeding Yarrow. Reviewed by: cem, gallatin, delphij Approved by: secteam Differential Revision: https://reviews.freebsd.org/D15526 Notes: svn path=/head/; revision=334450
* Allow different bridge types to coexistMatt Macy2018-05-111-3/+0
| | | | | | | | | | | | | | if_bridge has a lot of limitations that make it scale poorly to higher data rates. In my projects/VPC branch I leverage the bridge interface between layers for my high speed soft switch as well as for purposes of stacking in general. Reviewed by: sbruno@ Approved by: sbruno@ Differential Revision: https://reviews.freebsd.org/D15344 Notes: svn path=/head/; revision=333481
* Add network device event for priority code point, PCP, changes.Hans Petter Selasky2018-04-261-2/+5
| | | | | | | | | | | | | | | When the PCP is changed for either a VLAN network interface or when prio tagging is enabled for a regular ethernet network interface, broadcast the IFNET_EVENT_PCP event so applications like ibcore can update its GID tables accordingly. MFC after: 3 days Reviewed by: ae, kib Differential Revision: https://reviews.freebsd.org/D15040 Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=333015
* Improve copy-and-pasted versions of SIOCGIFADDR.Brooks Davis2018-03-271-7/+2
| | | | | | | | | | | | | | The original implementation used a reference to ifr_data and a cast to do the equivalent of accessing ifr_addr. This was copied multiple times since 1996. Approved by: kib MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14873 Notes: svn path=/head/; revision=331648
* Allow to specify PCP on packets not belonging to any VLAN.Konstantin Belousov2018-03-271-5/+120
| | | | | | | | | | | | | | | | | | | | | | | | According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged, and only PCP and DEI values from the VLAN tag are meaningful. See for instance https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html. Make it possible to specify PCP value for outgoing packets on an ethernet interface. When PCP is supplied, the tag is appended, VLAN id set to 0, and PCP is filled by the supplied value. The code to do VLAN tag encapsulation is refactored from the if_vlan.c and moved into if_ethersubr.c. Drivers might have issues with filtering VID 0 packets on receive. This bug should be fixed for each driver. Reviewed by: ae (previous version), hselasky, melifaro Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D14702 Notes: svn path=/head/; revision=331622
* netpfil: Introduce PFIL_FWD flagKristof Provost2018-03-231-2/+4
| | | | | | | | | | | | | | | | | | | Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715 Notes: svn path=/head/; revision=331436
* Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.Alexander V. Chernikov2018-03-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | Current arp/nd code relies on the feedback from the datapath indicating that the entry is still used. This mechanism is incorporated into the arpresolve()/nd6_resolve() routines. After the inpcb route cache introduction, the packet path for the locally-originated packets changed, passing cached lle pointer to the ether_output() directly. This resulted in the arp/ndp entry expire each time exactly after the configured max_age interval. During the small window between the ARP/NDP request and reply from the router, most of the packets got lost. Fix this behaviour by plugging datapath notification code to the packet path used by route cache. Unify the notification code by using single inlined function with the per-AF callbacks. Reported by: sthaug at nethelp.no Reviewed by: ae MFC after: 2 weeks Notes: svn path=/head/; revision=331098
* Do pass removing some write-only variables from the kernel.Alexander Kabaev2017-12-251-2/+0
| | | | | | | | | | | | This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385 Notes: svn path=/head/; revision=327173
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* ethernet: Add ethernet interface attached event and devctl notification.Sepherosa Ziehau2017-07-241-0/+7
| | | | | | | | | | | | | ifnet_arrival_event may not be adequate under certain situation; e.g. when the LLADDR is needed. So the ethernet ifattach event is announced after all necessary bits are setup. MFC after: 3 days Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D11617 Notes: svn path=/head/; revision=321406
* Persistently store NIC's hardware MAC address, and add a way to retrive itRavi Pokala2017-05-111-1/+2
| | | | | | | | | | | | | | | An earlier version of r318160 allocated if_hw_addr unconditionally; when it became conditional, I forgot to check for NULL in ether_ifattach(). Reviewed by: kp MFC after: 1 week MFC with: r318160 Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D10678 Pointy-hat to: rpokala Notes: svn path=/head/; revision=318176
* Persistently store NIC's hardware MAC address, and add a way to retrive itRavi Pokala2017-05-101-0/+2
| | | | | | | | | | | | | | | | | | | | | The MAC address reported by `ifconfig ${nic} ether' does not always match the address in the hardware, as reported by the driver during attach. In particular, NICs which are components of a lagg(4) interface all report the same MAC. When attaching, the NIC driver passes the MAC address it read from the hardware as an argument to ether_ifattach(). Keep a second copy of it, and create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along with the active MAC address. PR: 194386 Reviewed by: glebius MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D10609 Notes: svn path=/head/; revision=318160
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Extract out the various local definitions of ETHER_IS_BROADCAST() andAdrian Chadd2016-08-071-2/+0
| | | | | | | | | | | | turn them into a shared definition. Set M_MCAST/M_BCAST appropriately upon packet reception in net80211, just before they are delivered up to the ethernet stack. Submitted by: rstone Notes: svn path=/head/; revision=303811
* Make the KASSERT message more helpful by also printing the ifp informationBjoern A. Zeeb2016-06-061-1/+2
| | | | | | | | | | | which we are asserting. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=301498
* Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.Bjoern A. Zeeb2016-06-031-1/+17
| | | | | | | | | | | | | | | | | | | | | | | Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691 Notes: svn path=/head/; revision=301270
* This change re-adds L2 caching for TCP and UDP, as originally added in D4306George V. Neville-Neil2016-06-021-7/+40
| | | | | | | | | | | | but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262 Notes: svn path=/head/; revision=301217
* Remove the most useful INET || INET6 check leftover from whenever,Bjoern A. Zeeb2016-05-031-3/+0
| | | | | | | | | | doing nothing. MFC after: 1 week Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=298985
* sys/net* : for pointers replace 0 with NULL.Pedro F. Giffuni2016-04-151-2/+2
| | | | | | | | | Mostly cosmetical, no functional change. Found with devel/coccinelle. Notes: svn path=/head/; revision=298075
* Finish r275196: do not dereference rtentry in if_output() routines.Alexander V. Chernikov2016-01-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae Notes: svn path=/head/; revision=293544
* Remove second EVENTHANDLER_REGISTER slipped in r292978.Alexander V. Chernikov2016-01-011-0/+4
| | | | | | | Describe the reason of doing unconditional M_PREPEND in ether_output(). Notes: svn path=/head/; revision=293035
* Implement interface link header precomputation API.Alexander V. Chernikov2015-12-311-116/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102 Notes: svn path=/head/; revision=292978
* Replace the fastforward path with tryforward which does not require aGeorge V. Neville-Neil2015-11-051-2/+0
| | | | | | | | | | | | | | sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate) Notes: svn path=/head/; revision=290383
* Simplify the way of attaching IPv6 link-layer header.Alexander V. Chernikov2015-09-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether|atm|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469 Notes: svn path=/head/; revision=287861
* Use KASSERT for some checks, that are late to do.Andrey V. Elsukov2015-09-161-23/+3
| | | | | | | Discussed with: melifaro, glebius Notes: svn path=/head/; revision=287859
* Remove superfluous m_freem().Oleg Bulyzhin2015-09-161-1/+0
| | | | | | | MFC after: 1 month Notes: svn path=/head/; revision=287856
* Huge cleanup of random(4) code.Mark Murray2015-06-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.*. * live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij) Notes: svn path=/head/; revision=284959
* Refactor / restructure the RSS code into generic, IPv4 and IPv6 specificAdrian Chadd2015-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits) Notes: svn path=/head/; revision=277331
* Fix build broken by r275195.Alexander V. Chernikov2014-11-271-1/+3
| | | | Notes: svn path=/head/; revision=275197
* Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -Alexander V. Chernikov2014-11-271-6/+14
| | | | | | | | return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead. Notes: svn path=/head/; revision=275196
* Do not try to copy header to @dst and than back to ethernet in case ofAlexander V. Chernikov2014-11-271-15/+7
| | | | | | | | | | pseudo_AF_HDRCMPLT: we copy media header from mbuf to 'struct sockaddr' @dst in bpf_movein, so mbuf already contains valid info. Notes: svn path=/head/; revision=275195
* Remove remnants of if_ef(4).Gleb Smirnoff2014-11-091-5/+0
| | | | Notes: svn path=/head/; revision=274307
* Remove struct arpcom. It is unused by most interface types, that allocateGleb Smirnoff2014-11-071-47/+6
| | | | | | | | | | | it, except Ethernet, where it carried ng_ether(4) pointer. For now carry the pointer in if_l2com directly. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274231
* This is the much-discussed major upgrade to the random(4) device, known to ↵Mark Murray2014-10-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des) Notes: svn path=/head/; revision=273872
* Mechanically convert to if_inc_counter().Gleb Smirnoff2014-09-191-8/+8
| | | | Notes: svn path=/head/; revision=271867
* Several years after initial development, merge prototype support forRobert Watson2014-03-151-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge) Notes: svn path=/head/; revision=263198
* Remove AppleTalk support.Gleb Smirnoff2014-03-141-83/+0
| | | | | | | | | | | | | AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE. Notes: svn path=/head/; revision=263152
* Remove IPX support.Gleb Smirnoff2014-03-141-54/+0
| | | | | | | | | | | | | | IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE. Notes: svn path=/head/; revision=263140
* Simplify filling sockaddr_dl structure for if_resolvemulti()Alexander V. Chernikov2014-01-181-16/+2
| | | | | | | | | | | | | | | | | callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks Notes: svn path=/head/; revision=260870
* Allow ethernet drivers to pass in packets connected via the nextpkt pointer.George V. Neville-Neil2013-11-181-4/+16
| | | | | | | | | | Handling packets in this way allows drivers to amortize work during packet reception. Submitted by: Vijay Singh Sponsored by: NetApp Notes: svn path=/head/; revision=258328
* Restore the entropy gathering from the m_data pointer value, not theAdrian Chadd2013-11-021-1/+1
| | | | | | | | | m_data payload. After talking with markm/bde, this is what markm actually intended. Notes: svn path=/head/; revision=257548
* Convert the random entropy harvesting code to use a const void * pointerAdrian Chadd2013-11-011-1/+1
| | | | | | | | | | | | | | rather than just void *. Then, as part of this, convert a couple of mbuf m->m_data accesses to mtod(m, const void *). Reviewed by: markm Approved by: security-officer (delphij) Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=257525
* Move new pf includes to the pf directory. The pfvar.h remainGleb Smirnoff2013-10-271-1/+2
| | | | | | | | | | | | | in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz Notes: svn path=/head/; revision=257215
* The r48589 promised to remove implicit inclusion of if_var.h soon. PrepareGleb Smirnoff2013-10-261-0/+1
| | | | | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=257176
* MFC - tracking commit.Mark Murray2013-10-091-1/+2
|\ | | | | | | Notes: svn path=/projects/random_number_generator/; revision=256243
| * There are some high performance NICs that count statistics in hardware,Gleb Smirnoff2013-10-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius) Notes: svn path=/head/; revision=256218