diff options
Diffstat (limited to 'sbin')
45 files changed, 1421 insertions, 680 deletions
diff --git a/sbin/bectl/bectl.8 b/sbin/bectl/bectl.8 index cc88c7019d13..0e08b3383e9a 100644 --- a/sbin/bectl/bectl.8 +++ b/sbin/bectl/bectl.8 @@ -3,12 +3,12 @@ .\" .\" SPDX-License-Identifier: BSD-2-Clause .\" -.Dd April 9, 2024 +.Dd June 13, 2025 .Dt BECTL 8 .Os .Sh NAME .Nm bectl -.Nd Utility to manage boot environments on ZFS +.Nd manage ZFS boot environments .Sh SYNOPSIS .Nm .Op Fl h @@ -80,34 +80,31 @@ .Sh DESCRIPTION The .Nm -command is used to setup and interact with ZFS boot environments, which are -bootable clones of datasets. -.Pp -A boot environment allows the system to be upgraded, while preserving the -pre-upgrade system environment. -.Pp -.Nm -itself accepts an -.Fl r -flag specified before the command to indicate the -.Ar beroot -that should be used as the boot environment root, or the dataset whose children -are all boot environments. -Normally this information is derived from the bootfs property of the pool that -is mounted at -.Pa / , -but it is useful when the system has not been booted into a ZFS root or a -different pool should be operated on. -For instance, booting into the recovery media and manually importing a pool from -one of the system's resident disks will require the -.Fl r -flag to work. +utility manages bootable ZFS clones called boot environments. +Boot envionments allow system changes to be tested safely, +as they are selectable directly from the boot +.Xr loader 8 . +This utility can +.Cm create , +.Cm list , +.Cm mount , +or +.Cm jail +boot environments. +Once the changes have been tested, the boot environment can be +.Cm unmount Ns ed , +.Cm activate Ns d , +.Cm rename Ns d , +and +.Cm destroy Ns ed . .Ss Supported Subcommands and Flags -.Bl -tag -width activate -.It Xo -.Fl h -.Xc +.Bl -tag -width indent +.It Fl h Print usage information and exit. +.It Fl r Ar beroot Sy Ar subcommand +Specify a parent dataset for the boot environment to use for +.Ar subcommand +for operation on manually imported pools or unusual layouts. .It Xo .Cm activate .Op Fl t | Fl T @@ -122,19 +119,19 @@ flag is given, this takes effect only for the next boot. Flag .Fl T removes temporary boot once configuration. -Without temporary configuration, the next boot will use zfs dataset specified -in boot pool +Without temporary configuration, +the next boot will use zfs dataset specified in boot pool .Ar bootfs property. .It Xo .Cm check .Xc -Performs a silent sanity check on the current system. +Perform a check to see if the current system can use boot environments. If boot environments are supported and used, .Nm will exit with a status code of 0. -Any other status code is not currently defined and may, in the future, grow -special meaning for different degrees of sanity check failures. +Any other status code is not currently defined and may, in the future, +grow special meaning for different degrees of sanity check failures. .It Xo .Cm create .Op Fl r @@ -162,8 +159,8 @@ environment. .Pp If .Nm -is creating from another boot environment, a snapshot of that boot environment -will be created to clone from. +is creating from another boot environment, +a snapshot of that boot environment will be created to clone from. .It Xo .Cm create .Op Fl r @@ -174,8 +171,10 @@ Create a snapshot of the boot environment named .Pp If the .Fl r -flag is given, a recursive snapshot of the boot environment will be created. -A snapshot is created for each descendant dataset of the boot environment. +flag is given, +a recursive snapshot of the boot environment will be created. +A snapshot is created for each descendant dataset +of the boot environment. See .Sx Boot Environment Structures for a discussion on different layouts. @@ -241,8 +240,8 @@ If .Ar utility is specified, it will be executed instead of .Pa /bin/sh . -The jail will be destroyed and the boot environment unmounted when the command -finishes executing, unless the +The jail will be destroyed and the boot environment unmounted +when the command finishes executing, unless the .Fl U argument is specified. .Pp @@ -269,11 +268,11 @@ The following default parameters are provided: .It Va allow.mount Ta Cm true .It Va allow.mount.devfs Ta Cm true .It Va enforce_statfs Ta Cm 1 -.It Va name Ta Set to jail ID. +.It Va name Ta set to jail ID .It Va host.hostname Ta Va bootenv -.It Va path Ta Set to a path in Pa /tmp +.It Va path Ta set to a path in Pa /tmp generated by -.Xr libbe 3 . +.Xr libbe 3 .El .Pp All default parameters may be overwritten. @@ -298,8 +297,8 @@ or combination of .It Fl a Display all datasets. .It Fl D -Display the full space usage for each boot environment, assuming all -other boot environments were destroyed. +Display the full space usage for each boot environment, +assuming all other boot environments were destroyed. .It Fl H Used for scripting. Do not print headers and separate fields by a single tab instead of @@ -351,8 +350,8 @@ will make a directory such as .Pa be_mount.c6Sf in .Pa /tmp . -Randomness in the last four characters of the directory name will prevent -mount point conflicts. +Randomness in the last four characters of the directory name +will prevent mount point conflicts. Unmount of an environment, followed by mount of the same environment without giving a .Ar mountpoint , @@ -362,7 +361,7 @@ Rename the given .Ar origBeName to the given .Ar newBeName . -The boot environment will not be unmounted in order for this rename to occur. +The boot environment will not be unmounted for this rename to occur. .It Cm ujail Brq Ar jailId | jailName | beName .It Cm unjail Brq Ar jailId | jailName | beName Destroy the jail created from the given boot environment. @@ -390,8 +389,8 @@ boot environment layout, as created by the Auto ZFS option to .Xr bsdinstall 8 , is a .Dq shallow -boot environment structure, where boot environment datasets do not have any -directly subordinate datasets. +boot environment structure, where boot environment datasets +do not have any directly subordinate datasets. Instead, they're organized off in .Pa zroot/ROOT , and they rely on datasets elsewhere in the pool having @@ -419,7 +418,8 @@ set to .Dv off , thus files in .Pa /usr -typically fall into the boot environment because this dataset is not mounted. +typically fall into the boot environment +because this dataset is not mounted. .Pa zroot/usr/src is mounted, thus files in .Pa /usr/src @@ -445,8 +445,8 @@ Note that the subordinate datasets now have .Dv canmount set to .Dv noauto . -These are more obviously a part of the boot environment, as indicated by their -positioning in the layout. +These are more obviously a part of the boot environment, +as indicated by their positioning in the layout. These subordinate datasets will be mounted by the .Dv zfsbe .Xr rc 8 @@ -468,16 +468,25 @@ A future version of may default to handling both styles and deprecate the various .Fl r flags. -.\" .Sh EXAMPLES -.\" .Bl -bullet -.\" .It +.Sh EXAMPLES +Create a boot environment, named with today's date, +containing snapshots of the root dataset and of all child datasets: +.Pp +.Dl bectl create -r `date +%Y%m%d` +.Pp +Mount a previous boot environment, +.Ar yesterdaysbe , +to +.Pa /mnt : +.Pp +.Dl bectl mount yesterdaysbe /mnt .\" To fill in with jail upgrade example when behavior is firm. -.\" .El .Sh SEE ALSO .Xr libbe 3 , .Xr zfsprops 7 , .Xr beinstall.sh 8 , .Xr jail 8 , +.Xr loader 8 , .Xr zfs 8 , .Xr zpool 8 .Sh HISTORY diff --git a/sbin/devd/Makefile b/sbin/devd/Makefile index 4ff0187a5a22..5d5721d16884 100644 --- a/sbin/devd/Makefile +++ b/sbin/devd/Makefile @@ -46,6 +46,11 @@ HYPERV+= hyperv.conf HYPERVPACKAGE= hyperv-tools .endif +CONFGROUPS+= NVME +NVMEDIR= ${DEVDDIR} +NVME+= nvmf.conf +NVMEPACKAGE= nvme-tools + .if ${MK_USB} != "no" DEVD+= uath.conf ulpt.conf .endif diff --git a/sbin/devd/devd.cc b/sbin/devd/devd.cc index d7a3fee57870..1ff405244cde 100644 --- a/sbin/devd/devd.cc +++ b/sbin/devd/devd.cc @@ -153,6 +153,8 @@ static volatile sig_atomic_t romeo_must_die = 0; static const char *configfile = CF; +static char vm_guest[80]; + static void devdlog(int priority, const char* message, ...) __printflike(2, 3); static void event_loop(void); @@ -867,6 +869,8 @@ process_event(char *buffer) cfg.set_variable("timestamp", timestr); free(timestr); + cfg.set_variable("vm_guest", vm_guest); + // Match doesn't have a device, and the format is a little // different, so handle it separately. switch (type) { @@ -1107,6 +1111,14 @@ event_loop(void) err(1, "select"); } else if (rv == 0) check_clients(); + /* + * Aside from the socket type, both sockets use the same + * protocol, so we can process clients the same way. + */ + if (FD_ISSET(stream_fd, &fds)) + new_client(stream_fd, SOCK_STREAM); + if (FD_ISSET(seqpacket_fd, &fds)) + new_client(seqpacket_fd, SOCK_SEQPACKET); if (FD_ISSET(fd, &fds)) { rv = read(fd, buffer, sizeof(buffer) - 1); if (rv > 0) { @@ -1135,14 +1147,6 @@ event_loop(void) break; } } - if (FD_ISSET(stream_fd, &fds)) - new_client(stream_fd, SOCK_STREAM); - /* - * Aside from the socket type, both sockets use the same - * protocol, so we can process clients the same way. - */ - if (FD_ISSET(seqpacket_fd, &fds)) - new_client(seqpacket_fd, SOCK_SEQPACKET); } cfg.remove_pidfile(); close(seqpacket_fd); @@ -1322,6 +1326,7 @@ int main(int argc, char **argv) { int ch; + size_t len; check_devd_enabled(); while ((ch = getopt(argc, argv, "df:l:nq")) != -1) { @@ -1346,6 +1351,12 @@ main(int argc, char **argv) } } + len = sizeof(vm_guest); + if (sysctlbyname("kern.vm_guest", vm_guest, &len, NULL, 0) < 0) { + devdlog(LOG_ERR, + "sysctlbyname(kern.vm_guest) failed: %d\n", errno); + } + cfg.parse(); if (!no_daemon && daemonize_quick) { cfg.open_pidfile(); diff --git a/sbin/devd/devd.conf.5 b/sbin/devd/devd.conf.5 index 4dbd7338edb1..baf4b9d3a183 100644 --- a/sbin/devd/devd.conf.5 +++ b/sbin/devd/devd.conf.5 @@ -38,7 +38,7 @@ .\" ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS .\" SOFTWARE. .\" -.Dd July 8, 2025 +.Dd July 9, 2025 .Dt DEVD.CONF 5 .Os .Sh NAME @@ -517,6 +517,8 @@ and representing the start of a controller reset, the successful completion of a controller reset, or a timeout while waiting for the controller to reset, respectively. +.It Li nvme Ta Li controller Ta Li RECONNECT Ta +An NVMe over Fabrics host has disconnected and is requesting a reconnect. .El .Pp .Bl -column "SYSTEM" "SUBSYSTEM" "SHUTDOWN-THRESHOLD" -compact diff --git a/sbin/devd/hyperv.conf b/sbin/devd/hyperv.conf index 13695a0c75b6..70108ac36e54 100644 --- a/sbin/devd/hyperv.conf +++ b/sbin/devd/hyperv.conf @@ -103,5 +103,6 @@ notify 10 { notify 10 { match "system" "ETHERNET"; match "type" "IFATTACH"; + match "vm_guest" "hv"; action "/usr/libexec/hyperv/hyperv_vfattach $subsystem 0"; }; diff --git a/sbin/devd/moused.conf b/sbin/devd/moused.conf index 002edad9a8a9..ed1060ffdf2e 100644 --- a/sbin/devd/moused.conf +++ b/sbin/devd/moused.conf @@ -31,5 +31,5 @@ notify 100 { match "type" "DESTROY"; match "cdev" "ums[0-9]+"; - action "service moused stop $cdev"; + action "service moused quietstop $cdev"; }; diff --git a/sbin/devd/nvmf.conf b/sbin/devd/nvmf.conf new file mode 100644 index 000000000000..eaf3ebe86cec --- /dev/null +++ b/sbin/devd/nvmf.conf @@ -0,0 +1,7 @@ +# Attempt to reconnect NVMeoF host devices when requested +notify 100 { + match "system" "nvme"; + match "subsystem" "controller"; + match "type" "RECONNECT"; + action "nvmecontrol reconnect $name"; +}; diff --git a/sbin/dhclient/dhclient.c b/sbin/dhclient/dhclient.c index cbab3fa2973c..5d2a7453578b 100644 --- a/sbin/dhclient/dhclient.c +++ b/sbin/dhclient/dhclient.c @@ -539,7 +539,7 @@ main(int argc, char *argv[]) setproctitle("%s", ifi->name); /* setgroups(2) is not permitted in capability mode. */ - if (setgroups(1, &pw->pw_gid) != 0) + if (setgroups(0, NULL) != 0) error("can't restrict groups: %m"); if (caph_enter_casper() < 0) diff --git a/sbin/ifconfig/af_inet6.c b/sbin/ifconfig/af_inet6.c index 17dc068ee875..7986edf490b4 100644 --- a/sbin/ifconfig/af_inet6.c +++ b/sbin/ifconfig/af_inet6.c @@ -759,7 +759,7 @@ static struct afswtch af_inet6 = { #else .af_difaddr = NL_RTM_DELADDR, .af_aifaddr = NL_RTM_NEWADDR, - .af_ridreq = &in6_add, + .af_ridreq = &in6_del, .af_addreq = &in6_add, .af_exec = in6_exec_nl, #endif diff --git a/sbin/ifconfig/ifbridge.c b/sbin/ifconfig/ifbridge.c index ce5d2f4894fa..a75c37e640a2 100644 --- a/sbin/ifconfig/ifbridge.c +++ b/sbin/ifconfig/ifbridge.c @@ -80,6 +80,20 @@ get_val(const char *cp, u_long *valp) } static int +get_vlan_id(const char *cp, ether_vlanid_t *valp) +{ + u_long val; + + if (get_val(cp, &val) == -1) + return (-1); + if (val < DOT1Q_VID_MIN || val > DOT1Q_VID_MAX) + return (-1); + + *valp = (ether_vlanid_t)val; + return (0); +} + +static int do_cmd(if_ctx *ctx, u_long op, void *arg, size_t argsize, int set) { struct ifdrv ifd = {}; @@ -217,8 +231,9 @@ bridge_status(if_ctx *ctx) printf("%s%s ", prefix, member->ifbr_ifsname); printb("flags", member->ifbr_ifsflags, IFBIFBITS); printf("\n%s", pad); - printf("ifmaxaddr %u port %u priority %u path cost %u", - member->ifbr_addrmax, + if (member->ifbr_addrmax != 0) + printf("ifmaxaddr %u ", member->ifbr_addrmax); + printf("port %u priority %u path cost %u", member->ifbr_portno, member->ifbr_priority, member->ifbr_path_cost); @@ -241,8 +256,8 @@ bridge_status(if_ctx *ctx) else printf(" <unknown state %d>", state); } - if (member->ifbr_untagged != 0) - printf(" untagged %u", (unsigned)member->ifbr_untagged); + if (member->ifbr_pvid != 0) + printf(" untagged %u", (unsigned)member->ifbr_pvid); print_vlans(&bridge->member_vlans[i]); printf("\n"); } @@ -613,25 +628,15 @@ static void setbridge_untagged(if_ctx *ctx, const char *ifn, const char *vlanid) { struct ifbreq req; - u_long val; memset(&req, 0, sizeof(req)); + strlcpy(req.ifbr_ifsname, ifn, sizeof(req.ifbr_ifsname)); - if (get_val(vlanid, &val) < 0) + if (get_vlan_id(vlanid, &req.ifbr_pvid) < 0) errx(1, "invalid VLAN identifier: %s", vlanid); - /* - * Reject vlan 0, since it's not a valid vlan identifier and has a - * special meaning in the kernel interface. - */ - if (val == 0) - errx(1, "invalid VLAN identifier: %lu", val); - - strlcpy(req.ifbr_ifsname, ifn, sizeof(req.ifbr_ifsname)); - req.ifbr_untagged = val; - - if (do_cmd(ctx, BRDGSIFUNTAGGED, &req, sizeof(req), 1) < 0) - err(1, "BRDGSIFUNTAGGED %s", vlanid); + if (do_cmd(ctx, BRDGSIFPVID, &req, sizeof(req), 1) < 0) + err(1, "BRDGSIFPVID %s", vlanid); } static void @@ -642,10 +647,10 @@ unsetbridge_untagged(if_ctx *ctx, const char *ifn, int dummy __unused) memset(&req, 0, sizeof(req)); strlcpy(req.ifbr_ifsname, ifn, sizeof(req.ifbr_ifsname)); - req.ifbr_untagged = 0; + req.ifbr_pvid = 0; - if (do_cmd(ctx, BRDGSIFUNTAGGED, &req, sizeof(req), 1) < 0) - err(1, "BRDGSIFUNTAGGED"); + if (do_cmd(ctx, BRDGSIFPVID, &req, sizeof(req), 1) < 0) + err(1, "BRDGSIFPVID"); } static void diff --git a/sbin/ifconfig/ifconfig.8 b/sbin/ifconfig/ifconfig.8 index 3fb8b5f02b76..b6e7d3ff2c63 100644 --- a/sbin/ifconfig/ifconfig.8 +++ b/sbin/ifconfig/ifconfig.8 @@ -28,7 +28,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd July 5, 2025 +.Dd July 14, 2025 .Dt IFCONFIG 8 .Os .Sh NAME @@ -2878,34 +2878,26 @@ interfaces previously configured with Another name for the .Fl tunnel parameter. -.It Cm accept_rev_ethip_ver -Set a flag to accept both correct EtherIP packets and ones -with reversed version field. -Enabled by default. -This is for backward compatibility with -.Fx 6.1 , -6.2, 6.3, 7.0, and 7.1. -.It Cm -accept_rev_ethip_ver -Clear a flag -.Cm accept_rev_ethip_ver . +.It Cm noclamp +This flag prevents the MTU from being clamped to 1280 bytes, the +minimum MTU for IPv6, when the outer protocol is IPv6. When the +flag is set, the MTU value configured on the interface will be +used instead of the fixed length of 1280 bytes. For more details, +please refer to the +.Ar MTU Configuration and Path MTU Discovery +section in +.Xr gif 4 . +.It Cm -noclamp +Clear the flag +.Cm noclamp . .It Cm ignore_source Set a flag to accept encapsulated packets destined to this host independently from source address. This may be useful for hosts, that receive encapsulated packets from the load balancers. .It Cm -ignore_source -Clear a flag +Clear the flag .Cm ignore_source . -.It Cm send_rev_ethip_ver -Set a flag to send EtherIP packets with reversed version -field intentionally. -Disabled by default. -This is for backward compatibility with -.Fx 6.1 , -6.2, 6.3, 7.0, and 7.1. -.It Cm -send_rev_ethip_ver -Clear a flag -.Cm send_rev_ethip_ver . .El .Ss GRE Tunnel Parameters The following parameters apply to GRE tunnel interfaces, diff --git a/sbin/ifconfig/ifgif.c b/sbin/ifconfig/ifgif.c index 991cf110678f..9b8be210a59e 100644 --- a/sbin/ifconfig/ifgif.c +++ b/sbin/ifconfig/ifgif.c @@ -49,6 +49,7 @@ #include "ifconfig.h" static const char *GIFBITS[] = { + [0] = "NOCLAMP", [1] = "IGNORE_SOURCE", }; @@ -90,6 +91,8 @@ setgifopts(if_ctx *ctx, const char *val __unused, int d) } static struct cmd gif_cmds[] = { + DEF_CMD("noclamp", GIF_NOCLAMP, setgifopts), + DEF_CMD("-noclamp", -GIF_NOCLAMP, setgifopts), DEF_CMD("ignore_source", GIF_IGNORE_SOURCE, setgifopts), DEF_CMD("-ignore_source", -GIF_IGNORE_SOURCE, setgifopts), }; diff --git a/sbin/ifconfig/tests/inet6.sh b/sbin/ifconfig/tests/inet6.sh index edfd88d93af7..22399915a64d 100644 --- a/sbin/ifconfig/tests/inet6.sh +++ b/sbin/ifconfig/tests/inet6.sh @@ -76,8 +76,38 @@ broadcast_cleanup() vnet_cleanup } +atf_test_case "delete6" "cleanup" +delete6_head() +{ + atf_set descr 'Test removing IPv6 addresses' + atf_set require.user root +} + +delete6_body() +{ + vnet_init + + ep=$(vnet_mkepair) + + atf_check -s exit:0 \ + ifconfig ${ep}a inet6 fe80::42/64 + atf_check -s exit:0 -o match:"fe80::42%${ep}" \ + ifconfig ${ep}a inet6 + + atf_check -s exit:0 \ + ifconfig ${ep}a inet6 -alias fe80::42 + atf_check -s exit:0 -o not-match:"fe80::42%${ep}" \ + ifconfig ${ep}a inet6 +} + +delete6_cleanup() +{ + vnet_cleanup +} + atf_init_test_cases() { atf_add_test_case netmask atf_add_test_case broadcast + atf_add_test_case delete6 } diff --git a/sbin/kldstat/kldstat.c b/sbin/kldstat/kldstat.c index 79c647576440..3a90f1c97eb4 100644 --- a/sbin/kldstat/kldstat.c +++ b/sbin/kldstat/kldstat.c @@ -35,7 +35,7 @@ #include <libutil.h> #include <stdio.h> #include <stdlib.h> -#include <strings.h> +#include <string.h> #include <unistd.h> #define PTR_WIDTH ((int)(sizeof(void *) * 2 + 2)) @@ -51,7 +51,7 @@ printmod(int modid) { struct module_stat stat; - bzero(&stat, sizeof(stat)); + memset(&stat, 0, sizeof(stat)); stat.version = sizeof(struct module_stat); if (modstat(modid, &stat) < 0) { warn("can't stat module id %d", modid); diff --git a/sbin/mount/mount.8 b/sbin/mount/mount.8 index b584d71ea567..7bfc21ea41d5 100644 --- a/sbin/mount/mount.8 +++ b/sbin/mount/mount.8 @@ -28,7 +28,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd April 30, 2025 +.Dd July 16, 2025 .Dt MOUNT 8 .Os .Sh NAME @@ -80,7 +80,7 @@ Generate output via .Xr libxo 3 in a selection of different human and machine readable formats. See -.Xr xo_parse_args 3 +.Xr xo_options 7 for details on command line arguments. .It Fl a All the file systems described in @@ -573,7 +573,7 @@ support for a particular file system might be provided either on a static .Xr acl 3 , .Xr getmntinfo 3 , .Xr libxo 3 , -.Xr xo_parse_args 3 , +.Xr xo_options 7 , .Xr cd9660 4 , .Xr devfs 4 , .Xr ext2fs 4 , diff --git a/sbin/mount_fusefs/Makefile b/sbin/mount_fusefs/Makefile index e683b35f0c8a..a237ba99eb6b 100644 --- a/sbin/mount_fusefs/Makefile +++ b/sbin/mount_fusefs/Makefile @@ -20,7 +20,7 @@ DEBUG_FLAGS+= -DFUSE4BSD_VERSION="\"${F4BVERS}\"" PACKAGE=runtime PROG= mount_fusefs -MAN8= mount_fusefs.8 +MAN= mount_fusefs.8 LIBADD= util .include <bsd.prog.mk> diff --git a/sbin/nvmecontrol/connect.c b/sbin/nvmecontrol/connect.c index c1d5d2cbaf5a..3d6d12bf2c48 100644 --- a/sbin/nvmecontrol/connect.c +++ b/sbin/nvmecontrol/connect.c @@ -31,6 +31,8 @@ static struct options { const char *subnqn; const char *hostnqn; uint32_t kato; + uint32_t reconnect_delay; + uint32_t controller_loss_timeout; uint16_t num_io_queues; uint16_t queue_size; bool data_digests; @@ -43,6 +45,8 @@ static struct options { .subnqn = NULL, .hostnqn = NULL, .kato = NVMF_KATO_DEFAULT / 1000, + .reconnect_delay = NVMF_DEFAULT_RECONNECT_DELAY, + .controller_loss_timeout = NVMF_DEFAULT_CONTROLLER_LOSS, .num_io_queues = 1, .queue_size = 0, .data_digests = false, @@ -107,7 +111,7 @@ connect_nvm_controller(enum nvmf_trtype trtype, int adrfam, const char *address, } error = nvmf_handoff_host(dle, hostnqn, admin, opt.num_io_queues, io, - &cdata); + &cdata, opt.reconnect_delay, opt.controller_loss_timeout); if (error != 0) { warnc(error, "Failed to handoff queues to kernel"); free(io); @@ -259,6 +263,11 @@ static const struct opts connect_opts[] = { "Number of entries in each I/O queue"), OPT("keep-alive-tmo", 'k', arg_uint32, opt, kato, "Keep Alive timeout (in seconds)"), + OPT("reconnect-delay", 'r', arg_uint32, opt, reconnect_delay, + "Delay between reconnect attempts after connection loss " + "(in seconds)"), + OPT("ctrl-loss-tmo", 'l', arg_uint32, opt, controller_loss_timeout, + "Controller loss timeout after connection loss (in seconds)"), OPT("hostnqn", 'q', arg_string, opt, hostnqn, "Host NQN"), OPT("flow_control", 'F', arg_none, opt, flow_control, diff --git a/sbin/nvmecontrol/nvmecontrol.8 b/sbin/nvmecontrol/nvmecontrol.8 index d886b60a2545..624a0c93719b 100644 --- a/sbin/nvmecontrol/nvmecontrol.8 +++ b/sbin/nvmecontrol/nvmecontrol.8 @@ -33,7 +33,7 @@ .\" .\" Author: Jim Harris <jimharris@FreeBSD.org> .\" -.Dd April 29, 2025 +.Dd July 9, 2025 .Dt NVMECONTROL 8 .Os .Sh NAME @@ -216,6 +216,8 @@ .Op Fl c Ar cntl-id .Op Fl i Ar queues .Op Fl k Ar seconds +.Op Fl l Ar seconds +.Op Fl r Ar seconds .Op Fl t Ar transport .Op Fl q Ar HostNQN .Op Fl Q Ar entries @@ -226,6 +228,8 @@ .Op Fl FGg .Op Fl i Ar queues .Op Fl k Ar seconds +.Op Fl l Ar seconds +.Op Fl r Ar seconds .Op Fl t Ar transport .Op Fl q Ar HostNQN .Op Fl Q Ar entries @@ -241,6 +245,8 @@ .Op Fl FGg .Op Fl i Ar queues .Op Fl k Ar seconds +.Op Fl l Ar seconds +.Op Fl r Ar seconds .Op Fl t Ar transport .Op Fl q Ar HostNQN .Op Fl Q Ar entries @@ -786,6 +792,29 @@ The default is 1. .It Fl k Ar seconds Keep Alive timer duration in seconds. The default is 120. +.It Fl l Ar seconds +Controller Loss timer duration in seconds. +The default is 600. +.Pp +This timer starts when an association is lost with a remote I/O controller +and is cancelled when a new association is established. +If the timer expires, the controller device is deleted. +A setting of zero disables this timer. +.It Fl r Ar seconds +Reconnect timer duration in seconds. +The default is 10. +.Pp +When an association is lost with a remote I/O controller, +the controller device will request reconnection via periodic +.Xr devctl 4 +notifications until either a new association is established or the controller +device is deleted. +This timer sets the interval between each +.Xr devctl 4 +notification. +Note that the first notification is triggered immediately after an association +is lost. +A setting of zero disables this timer. .It Fl t Ar transport Transport to use. The default is diff --git a/sbin/nvmecontrol/reconnect.c b/sbin/nvmecontrol/reconnect.c index adf1edac662b..06af40624177 100644 --- a/sbin/nvmecontrol/reconnect.c +++ b/sbin/nvmecontrol/reconnect.c @@ -27,6 +27,8 @@ static struct options { const char *transport; const char *hostnqn; uint32_t kato; + uint32_t reconnect_delay; + uint32_t controller_loss_timeout; uint16_t num_io_queues; uint16_t queue_size; bool data_digests; @@ -37,6 +39,8 @@ static struct options { .transport = "tcp", .hostnqn = NULL, .kato = NVMF_KATO_DEFAULT / 1000, + .reconnect_delay = NVMF_DEFAULT_RECONNECT_DELAY, + .controller_loss_timeout = NVMF_DEFAULT_CONTROLLER_LOSS, .num_io_queues = 1, .queue_size = 0, .data_digests = false, @@ -59,6 +63,7 @@ static int reconnect_nvm_controller(int fd, const struct nvmf_association_params *aparams, enum nvmf_trtype trtype, int adrfam, const char *address, const char *port, uint16_t cntlid, const char *subnqn, const char *hostnqn, uint32_t kato, + uint32_t reconnect_delay, uint32_t controller_loss_timeout, u_int num_io_queues, u_int queue_size, const struct nvme_discovery_log_entry *dle) { @@ -88,7 +93,7 @@ reconnect_nvm_controller(int fd, const struct nvmf_association_params *aparams, } error = nvmf_reconnect_host(fd, dle, hostnqn, admin, num_io_queues, io, - &cdata); + &cdata, reconnect_delay, controller_loss_timeout); if (error != 0) { warnc(error, "Failed to handoff queues to kernel"); free(io); @@ -137,7 +142,8 @@ reconnect_by_address(int fd, const nvlist_t *rparams, const char *addr) error = reconnect_nvm_controller(fd, &aparams, trtype, AF_UNSPEC, address, port, le16toh(dle->cntlid), subnqn, hostnqn, - opt.kato * 1000, opt.num_io_queues, opt.queue_size, NULL); + opt.kato * 1000, opt.reconnect_delay, opt.controller_loss_timeout, + opt.num_io_queues, opt.queue_size, NULL); free(subnqn); free(tofree); return (error); @@ -196,6 +202,8 @@ reconnect_by_params(int fd, const nvlist_t *rparams) address, port, le16toh(dle->cntlid), dle->subnqn, nvlist_get_string(rparams, "hostnqn"), dnvlist_get_number(rparams, "kato", 0), + dnvlist_get_number(rparams, "reconnect_delay", 0), + dnvlist_get_number(rparams, "controller_loss_timeout", 0), nvlist_get_number(rparams, "num_io_queues"), nvlist_get_number(rparams, "io_qsize"), dle); free(subnqn); @@ -291,6 +299,11 @@ static const struct opts reconnect_opts[] = { "Number of entries in each I/O queue"), OPT("keep-alive-tmo", 'k', arg_uint32, opt, kato, "Keep Alive timeout (in seconds)"), + OPT("reconnect-delay", 'r', arg_uint32, opt, reconnect_delay, + "Delay between reconnect attempts after connection loss " + "(in seconds)"), + OPT("ctrl-loss-tmo", 'l', arg_uint32, opt, controller_loss_timeout, + "Controller loss timeout after connection loss (in seconds)"), OPT("hostnqn", 'q', arg_string, opt, hostnqn, "Host NQN"), OPT("flow_control", 'F', arg_none, opt, flow_control, diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y index af1fb95398f8..358fa909fc50 100644 --- a/sbin/pfctl/parse.y +++ b/sbin/pfctl/parse.y @@ -95,7 +95,7 @@ static struct file { int eof_reached; int lineno; int errors; -} *file; +} *file, *topfile; struct file *pushfile(const char *, int); int popfile(void); int check_file_secrecy(int, const char *); @@ -367,6 +367,7 @@ static struct node_fairq_opts fairq_opts; static struct node_state_opt *keep_state_defaults = NULL; static struct pfctl_watermarks syncookie_opts; +int validate_range(uint8_t, uint16_t, uint16_t); int disallow_table(struct node_host *, const char *); int disallow_urpf_failed(struct node_host *, const char *); int disallow_alias(struct node_host *, const char *); @@ -3231,8 +3232,7 @@ logopts : logopt { $$ = $1; } logopt : ALL { $$.log = PF_LOG_ALL; $$.logif = 0; } | MATCHES { $$.log = PF_LOG_MATCHES; $$.logif = 0; } - | USER { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; } - | GROUP { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; } + | USER { $$.log = PF_LOG_USER; $$.logif = 0; } | TO string { const char *errstr; u_int i; @@ -3825,9 +3825,14 @@ port_item : portrange { err(1, "port_item: calloc"); $$->port[0] = $1.a; $$->port[1] = $1.b; - if ($1.t) + if ($1.t) { $$->op = PF_OP_RRG; - else + if (validate_range($$->op, $$->port[0], + $$->port[1])) { + yyerror("invalid port range"); + YYERROR; + } + } else $$->op = PF_OP_EQ; $$->next = NULL; $$->tail = $$; @@ -3844,6 +3849,10 @@ port_item : portrange { $$->port[0] = $2.a; $$->port[1] = $2.b; $$->op = $1; + if (validate_range($$->op, $$->port[0], $$->port[1])) { + yyerror("invalid port range"); + YYERROR; + } $$->next = NULL; $$->tail = $$; } @@ -3859,6 +3868,10 @@ port_item : portrange { $$->port[0] = $1.a; $$->port[1] = $3.a; $$->op = $2; + if (validate_range($$->op, $$->port[0], $$->port[1])) { + yyerror("invalid port range"); + YYERROR; + } $$->next = NULL; $$->tail = $$; } @@ -3905,7 +3918,7 @@ uid_item : uid { $$->tail = $$; } | unaryop uid { - if ($2 == UID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) { + if ($2 == -1 && $1 != PF_OP_EQ && $1 != PF_OP_NE) { yyerror("user unknown requires operator = or " "!="); YYERROR; @@ -3920,7 +3933,7 @@ uid_item : uid { $$->tail = $$; } | uid PORTBINARY uid { - if ($1 == UID_MAX || $3 == UID_MAX) { + if ($1 == -1 || $3 == -1) { yyerror("user unknown requires operator = or " "!="); YYERROR; @@ -3938,7 +3951,7 @@ uid_item : uid { uid : STRING { if (!strcmp($1, "unknown")) - $$ = UID_MAX; + $$ = -1; else { uid_t uid; @@ -3983,7 +3996,7 @@ gid_item : gid { $$->tail = $$; } | unaryop gid { - if ($2 == GID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) { + if ($2 == -1 && $1 != PF_OP_EQ && $1 != PF_OP_NE) { yyerror("group unknown requires operator = or " "!="); YYERROR; @@ -3998,7 +4011,7 @@ gid_item : gid { $$->tail = $$; } | gid PORTBINARY gid { - if ($1 == GID_MAX || $3 == GID_MAX) { + if ($1 == -1 || $3 == -1) { yyerror("group unknown requires operator = or " "!="); YYERROR; @@ -4016,7 +4029,7 @@ gid_item : gid { gid : STRING { if (!strcmp($1, "unknown")) - $$ = GID_MAX; + $$ = -1; else { gid_t gid; @@ -5197,6 +5210,19 @@ yyerror(const char *fmt, ...) } int +validate_range(uint8_t op, uint16_t p1, uint16_t p2) +{ + uint16_t a = ntohs(p1); + uint16_t b = ntohs(p2); + + if ((op == PF_OP_RRG && a > b) || /* 34:12, i.e. none */ + (op == PF_OP_IRG && a >= b) || /* 34><12, i.e. none */ + (op == PF_OP_XRG && a > b)) /* 34<>22, i.e. all */ + return 1; + return 0; +} + +int disallow_table(struct node_host *h, const char *fmt) { for (; h != NULL; h = h->next) @@ -5324,6 +5350,10 @@ filter_consistent(struct pfctl_rule *r, int anchor_call) "synproxy state or modulate state"); problems++; } + if ((r->keep_state == PF_STATE_SYNPROXY) && (r->direction != PF_IN)) + fprintf(stderr, "%s:%d: warning: " + "synproxy used for inbound rules only, " + "ignored for outbound\n", file->name, yylval.lineno); if (r->rule_flag & PFRULE_AFTO && r->rt) { if (r->rt != PF_ROUTETO && r->rt != PF_REPLYTO) { yyerror("dup-to " @@ -5458,7 +5488,7 @@ process_tabledef(char *name, struct table_opts *opts, int popts) name); else yyerror("cannot define table %s: %s", name, - pfr_strerror(errno)); + pf_strerror(errno)); goto _error; } @@ -6014,8 +6044,14 @@ apply_rdr_ports(struct pfctl_rule *r, struct pfctl_pool *rpool, struct redirspec if (!rs->rport.b && rs->rport.t) { rpool->proxy_port[1] = ntohs(rs->rport.a) + (ntohs(r->dst.port[1]) - ntohs(r->dst.port[0])); - } else + } else { + if (validate_range(rs->rport.t, rs->rport.a, + rs->rport.b)) { + yyerror("invalid rdr-to port range"); + return (1); + } r->rdr.proxy_port[1] = ntohs(rs->rport.b); + } if (rs->pool_opts.staticport) { yyerror("the 'static-port' option is only valid with nat rules"); @@ -6743,7 +6779,7 @@ lgetc(int quotec) if (quotec) { if ((c = igetc()) == EOF) { yyerror("reached end of file while parsing quoted string"); - if (popfile() == EOF) + if (file == topfile || popfile() == EOF) return (EOF); return (quotec); } @@ -6771,7 +6807,7 @@ lgetc(int quotec) return ('\n'); } while (c == EOF) { - if (popfile() == EOF) + if (file == topfile || popfile() == EOF) return (EOF); c = igetc(); } @@ -7069,17 +7105,17 @@ popfile(void) { struct file *prev; - if ((prev = TAILQ_PREV(file, files, entry)) != NULL) { + if ((prev = TAILQ_PREV(file, files, entry)) != NULL) prev->errors += file->errors; - TAILQ_REMOVE(&files, file, entry); - fclose(file->stream); - free(file->name); - free(file->ungetbuf); - free(file); - file = prev; - return (0); - } - return (EOF); + + TAILQ_REMOVE(&files, file, entry); + fclose(file->stream); + free(file->name); + free(file->ungetbuf); + free(file); + file = prev; + + return (file ? 0 : EOF); } int @@ -7102,6 +7138,7 @@ parse_config(char *filename, struct pfctl *xpf) warn("cannot open the main config file!"); return (-1); } + topfile = file; yyparse(); errors = file->errors; @@ -7201,19 +7238,11 @@ mv_rules(struct pfctl_ruleset *src, struct pfctl_ruleset *dst) struct pfctl_rule *r; for (i = 0; i < PF_RULESET_MAX; ++i) { - while ((r = TAILQ_FIRST(src->rules[i].active.ptr)) - != NULL) { - TAILQ_REMOVE(src->rules[i].active.ptr, r, entries); - TAILQ_INSERT_TAIL(dst->rules[i].active.ptr, r, entries); + TAILQ_FOREACH(r, src->rules[i].active.ptr, entries) dst->anchor->match++; - } + TAILQ_CONCAT(dst->rules[i].active.ptr, src->rules[i].active.ptr, entries); src->anchor->match = 0; - while ((r = TAILQ_FIRST(src->rules[i].inactive.ptr)) - != NULL) { - TAILQ_REMOVE(src->rules[i].inactive.ptr, r, entries); - TAILQ_INSERT_TAIL(dst->rules[i].inactive.ptr, - r, entries); - } + TAILQ_CONCAT(dst->rules[i].inactive.ptr, src->rules[i].inactive.ptr, entries); } } diff --git a/sbin/pfctl/pfctl.8 b/sbin/pfctl/pfctl.8 index 28efff896956..f582c6301124 100644 --- a/sbin/pfctl/pfctl.8 +++ b/sbin/pfctl/pfctl.8 @@ -24,7 +24,7 @@ .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" -.Dd July 2, 2025 +.Dd July 7, 2025 .Dt PFCTL 8 .Os .Sh NAME @@ -186,6 +186,13 @@ as the anchor name: .Bd -literal -offset indent # pfctl -a '*' -sr .Ed +.Pp +To flush all rulesets and tables recursively, specify only +.Sq * +as the anchor name: +.Bd -literal -offset indent +# pfctl -a '*' -Fa +.Ed .It Fl D Ar macro Ns = Ns Ar value Define .Ar macro @@ -231,6 +238,19 @@ for details. .It Fl F Cm all Flush all of the above. .El +.Pp +If +.Fl a +is specified as well and +.Ar anchor +is terminated with a +.Sq * +character, +.Cm rules , +.Cm Tables +and +.Cm all +flush the given anchor recursively. .It Fl f Ar file Load the rules contained in .Ar file . @@ -467,7 +487,10 @@ Show the contents of the source tracking table. Show filter information (statistics and counters). When used together with .Fl v , -source tracking statistics are also shown. +source tracking statistics, the firewall's 32-bit hostid number and the +main ruleset's MD5 checksum for use with +.Xr pfsync 4 +are also shown. .It Fl s Cm Running Show the running status and provide a non-zero exit status when disabled. .It Fl s Cm labels diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c index defba3b56c44..2015e0a09549 100644 --- a/sbin/pfctl/pfctl.c +++ b/sbin/pfctl/pfctl.c @@ -59,6 +59,8 @@ #include <stdlib.h> #include <string.h> #include <unistd.h> +#include <stdarg.h> +#include <libgen.h> #include "pfctl_parser.h" #include "pfctl.h" @@ -72,7 +74,7 @@ void pfctl_check_skip_ifaces(char *); void pfctl_adjust_skip_ifaces(struct pfctl *); void pfctl_clear_interface_flags(int, int); void pfctl_flush_eth_rules(int, int, char *); -void pfctl_flush_rules(int, int, char *); +int pfctl_flush_rules(int, int, char *); void pfctl_flush_nat(int, int, char *); int pfctl_clear_altq(int, int); void pfctl_clear_src_nodes(int, int); @@ -124,6 +126,17 @@ int pfctl_load_ruleset(struct pfctl *, char *, int pfctl_load_rule(struct pfctl *, char *, struct pfctl_rule *, int); const char *pfctl_lookup_option(char *, const char * const *); void pfctl_reset(int, int); +int pfctl_walk_show(int, struct pfioc_ruleset *, void *); +int pfctl_walk_get(int, struct pfioc_ruleset *, void *); +int pfctl_walk_anchors(int, int, const char *, + int(*)(int, struct pfioc_ruleset *, void *), void *); +struct pfr_anchors * + pfctl_get_anchors(int, const char *, int); +int pfctl_recurse(int, int, const char *, + int(*)(int, int, struct pfr_anchoritem *)); +int pfctl_call_clearrules(int, int, struct pfr_anchoritem *); +int pfctl_call_cleartables(int, int, struct pfr_anchoritem *); +int pfctl_call_clearanchors(int, int, struct pfr_anchoritem *); static struct pfctl_anchor_global pf_anchors; struct pfctl_anchor pf_main_anchor; @@ -151,6 +164,7 @@ int dev = -1; struct pfctl_handle *pfh = NULL; static int first_title = 1; static int labels = 0; +static int exit_val = 0; #define INDENT(d, o) do { \ if (o) { \ @@ -269,6 +283,40 @@ usage(void) exit(1); } +void +pfctl_err(int opts, int eval, const char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + + if ((opts & PF_OPT_IGNFAIL) == 0) + verr(eval, fmt, ap); + else + vwarn(fmt, ap); + + va_end(ap); + + exit_val = eval; +} + +void +pfctl_errx(int opts, int eval, const char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + + if ((opts & PF_OPT_IGNFAIL) == 0) + verrx(eval, fmt, ap); + else + vwarnx(fmt, ap); + + va_end(ap); + + exit_val = eval; +} + /* * Cache protocol number to name translations. * @@ -361,7 +409,7 @@ pfctl_clear_stats(struct pfctl_handle *h, int opts) { int ret; if ((ret = pfctl_clear_status(h)) != 0) - errc(1, ret, "DIOCCLRSTATUS"); + pfctl_err(opts, 1, "DIOCCLRSTATUS"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "pf: statistics cleared\n"); } @@ -469,16 +517,19 @@ pfctl_flush_eth_rules(int dev, int opts, char *anchorname) fprintf(stderr, "Ethernet rules cleared\n"); } -void +int pfctl_flush_rules(int dev, int opts, char *anchorname) { int ret; ret = pfctl_clear_rules(dev, anchorname); - if (ret != 0) - err(1, "pfctl_clear_rules"); - if ((opts & PF_OPT_QUIET) == 0) + if (ret != 0) { + pfctl_err(opts, 1, "%s", __func__); + return (1); + } else if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "rules cleared\n"); + + return (0); } void @@ -515,7 +566,7 @@ void pfctl_clear_src_nodes(int dev, int opts) { if (ioctl(dev, DIOCCLRSRCNODES)) - err(1, "DIOCCLRSRCNODES"); + pfctl_err(opts, 1, "DIOCCLRSRCNODES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "source tracking entries cleared\n"); } @@ -530,13 +581,13 @@ pfctl_clear_iface_states(int dev, const char *iface, int opts) memset(&kill, 0, sizeof(kill)); if (iface != NULL && strlcpy(kill.ifname, iface, sizeof(kill.ifname)) >= sizeof(kill.ifname)) - errx(1, "invalid interface: %s", iface); + pfctl_errx(opts, 1, "invalid interface: %s", iface); if (opts & PF_OPT_KILLMATCH) kill.kill_match = true; if ((ret = pfctl_clear_states_h(pfh, &kill, &killed)) != 0) - errc(1, ret, "DIOCCLRSTATES"); + pfctl_err(opts, 1, "DIOCCLRSTATUS"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "%d states cleared\n", killed); } @@ -687,7 +738,7 @@ pfctl_net_kill_states(int dev, const char *iface, int opts) memset(&last_dst, 0xff, sizeof(last_dst)); if (iface != NULL && strlcpy(kill.ifname, iface, sizeof(kill.ifname)) >= sizeof(kill.ifname)) - errx(1, "invalid interface: %s", iface); + pfctl_errx(opts, 1, "invalid interface: %s", iface); if (state_killers == 2 && (strcmp(state_kill[0], "nat") == 0)) { kill.nat = true; @@ -740,13 +791,13 @@ pfctl_net_kill_states(int dev, const char *iface, int opts) resp[1]->ai_addr); if ((ret = pfctl_kill_states_h(pfh, &kill, &newkilled)) != 0) - errc(1, ret, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); killed += newkilled; } freeaddrinfo(res[1]); } else { if ((ret = pfctl_kill_states_h(pfh, &kill, &newkilled)) != 0) - errc(1, ret, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); killed += newkilled; } } @@ -778,7 +829,7 @@ pfctl_gateway_kill_states(int dev, const char *iface, int opts) memset(&last_src, 0xff, sizeof(last_src)); if (iface != NULL && strlcpy(kill.ifname, iface, sizeof(kill.ifname)) >= sizeof(kill.ifname)) - errx(1, "invalid interface: %s", iface); + pfctl_errx(opts, 1, "invalid interface: %s", iface); if (opts & PF_OPT_KILLMATCH) kill.kill_match = true; @@ -799,7 +850,7 @@ pfctl_gateway_kill_states(int dev, const char *iface, int opts) copy_satopfaddr(&kill.rt_addr.addr.v.a.addr, resp->ai_addr); if (pfctl_kill_states_h(pfh, &kill, &newkilled)) - err(1, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); killed += newkilled; } @@ -823,7 +874,7 @@ pfctl_label_kill_states(int dev, const char *iface, int opts) memset(&kill, 0, sizeof(kill)); if (iface != NULL && strlcpy(kill.ifname, iface, sizeof(kill.ifname)) >= sizeof(kill.ifname)) - errx(1, "invalid interface: %s", iface); + pfctl_errx(opts, 1, "invalid interface: %s", iface); if (opts & PF_OPT_KILLMATCH) kill.kill_match = true; @@ -833,7 +884,7 @@ pfctl_label_kill_states(int dev, const char *iface, int opts) errx(1, "label too long: %s", state_kill[1]); if ((ret = pfctl_kill_states_h(pfh, &kill, &killed)) != 0) - errc(1, ret, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states\n", killed); @@ -871,7 +922,7 @@ pfctl_id_kill_states(int dev, const char *iface, int opts) } if ((ret = pfctl_kill_states_h(pfh, &kill, &killed)) != 0) - errc(1, ret, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states\n", killed); @@ -895,7 +946,7 @@ pfctl_key_kill_states(int dev, const char *iface, int opts) if (iface != NULL && strlcpy(kill.ifname, iface, sizeof(kill.ifname)) >= sizeof(kill.ifname)) - errx(1, "invalid interface: %s", iface); + pfctl_errx(opts, 1, "invalid interface: %s", iface); s = strdup(state_kill[1]); if (!s) @@ -931,7 +982,7 @@ pfctl_key_kill_states(int dev, const char *iface, int opts) errx(1, "invalid host: %s", tokens[didx]); if ((ret = pfctl_kill_states_h(pfh, &kill, &killed)) != 0) - errc(1, ret, "DIOCKILLSTATES"); + pfctl_errx(opts, 1, "DIOCKILLSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states\n", killed); @@ -1289,17 +1340,12 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, u_int32_t mnr, nr; memset(&prs, 0, sizeof(prs)); - if ((ret = pfctl_get_rulesets(pfh, npath, &mnr)) != 0) { - if (ret == EINVAL) - fprintf(stderr, "Anchor '%s' " - "not found.\n", anchorname); - else - errc(1, ret, "DIOCGETRULESETS"); - } + if ((ret = pfctl_get_rulesets(pfh, npath, &mnr)) != 0) + errx(1, "%s", pf_strerror(ret)); for (nr = 0; nr < mnr; ++nr) { if ((ret = pfctl_get_ruleset(pfh, npath, nr, &prs)) != 0) - errc(1, ret, "DIOCGETRULESET"); + errx(1, "%s", pf_strerror(ret)); INDENT(depth, !(opts & PF_OPT_VERBOSE)); printf("anchor \"%s\" all {\n", prs.name); pfctl_show_rules(dev, npath, opts, @@ -1314,14 +1360,14 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, if (opts & PF_OPT_SHOWALL) { ret = pfctl_get_rules_info_h(pfh, &ri, PF_PASS, path); if (ret != 0) { - warnc(ret, "DIOCGETRULES"); + warnx("%s", pf_strerror(ret)); goto error; } header++; } ret = pfctl_get_rules_info_h(pfh, &ri, PF_SCRUB, path); if (ret != 0) { - warnc(ret, "DIOCGETRULES"); + warnx("%s", pf_strerror(ret)); goto error; } if (opts & PF_OPT_SHOWALL) { @@ -1514,12 +1560,12 @@ pfctl_show_nat(int dev, const char *path, int opts, char *anchorname, int depth, fprintf(stderr, "NAT anchor '%s' " "not found.\n", anchorname); else - errc(1, ret, "DIOCGETRULESETS"); + errx(1, "%s", pf_strerror(ret)); } for (nr = 0; nr < mnr; ++nr) { if ((ret = pfctl_get_ruleset(pfh, npath, nr, &prs)) != 0) - errc(1, ret, "DIOCGETRULESET"); + errx(1, "%s", pf_strerror(ret)); INDENT(depth, !(opts & PF_OPT_VERBOSE)); printf("nat-anchor \"%s\" all {\n", prs.name); pfctl_show_nat(dev, npath, opts, @@ -2038,8 +2084,8 @@ pfctl_load_ruleset(struct pfctl *pf, char *path, struct pfctl_ruleset *rs, if ((pf->opts & PF_OPT_NOACTION) == 0 && (error = pfctl_ruleset_trans(pf, path, rs->anchor, false))) { - printf("pfctl_load_rulesets: " - "pfctl_ruleset_trans %d\n", error); + printf("%s: " + "pfctl_ruleset_trans %d\n", __func__, error); goto error; } } else if (pf->opts & PF_OPT_VERBOSE) @@ -2805,7 +2851,7 @@ pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how) if ((pf->opts & PF_OPT_NOACTION) == 0) { if (how == 0) { if (ioctl(pf->dev, DIOCCLRIFFLAG, &pi)) - err(1, "DIOCCLRIFFLAG"); + pfctl_err(pf->opts, 1, "DIOCCLRIFFLAG"); } else { if (ioctl(pf->dev, DIOCSETIFFLAG, &pi)) err(1, "DIOCSETIFFLAG"); @@ -2864,43 +2910,178 @@ pfctl_test_altqsupport(int dev, int opts) } int -pfctl_show_anchors(int dev, int opts, char *anchorname) +pfctl_walk_show(int opts, struct pfioc_ruleset *pr, void *warg) +{ + if (pr->path[0]) { + if (pr->path[0] != '_' || (opts & PF_OPT_VERBOSE)) + printf(" %s/%s\n", pr->path, pr->name); + } else if (pr->name[0] != '_' || (opts & PF_OPT_VERBOSE)) + printf(" %s\n", pr->name); + + return (0); +} + +int +pfctl_walk_get(int opts, struct pfioc_ruleset *pr, void *warg) +{ + struct pfr_anchoritem *pfra; + struct pfr_anchors *anchors; + int e; + + anchors = (struct pfr_anchors *)warg; + + pfra = malloc(sizeof(*pfra)); + if (pfra == NULL) + err(1, "%s", __func__); + + if (pr->path[0]) + e = asprintf(&pfra->pfra_anchorname, "%s/%s", pr->path, + pr->name); + else + e = asprintf(&pfra->pfra_anchorname, "%s", pr->name); + + if (e == -1) + err(1, "%s", __func__); + + SLIST_INSERT_HEAD(anchors, pfra, pfra_sle); + + return (0); +} + +int +pfctl_walk_anchors(int dev, int opts, const char *anchor, + int(walkf)(int, struct pfioc_ruleset *, void *), void *warg) { struct pfioc_ruleset pr; u_int32_t mnr, nr; int ret; memset(&pr, 0, sizeof(pr)); - if ((ret = pfctl_get_rulesets(pfh, anchorname, &mnr)) != 0) { - if (ret == EINVAL) - fprintf(stderr, "Anchor '%s' not found.\n", - anchorname); - else - errc(1, ret, "DIOCGETRULESETS"); - return (-1); - } + if ((ret = pfctl_get_rulesets(pfh, anchor, &mnr)) != 0) + errx(1, "%s", pf_strerror(ret)); for (nr = 0; nr < mnr; ++nr) { char sub[MAXPATHLEN]; - if ((ret = pfctl_get_ruleset(pfh, anchorname, nr, &pr)) != 0) + if ((ret = pfctl_get_ruleset(pfh, anchor, nr, &pr)) != 0) errc(1, ret, "DIOCGETRULESET"); if (!strcmp(pr.name, PF_RESERVED_ANCHOR)) continue; sub[0] = '\0'; - if (pr.path[0]) { - strlcat(sub, pr.path, sizeof(sub)); - strlcat(sub, "/", sizeof(sub)); - } - strlcat(sub, pr.name, sizeof(sub)); - if (sub[0] != '_' || (opts & PF_OPT_VERBOSE)) - printf(" %s\n", sub); - if ((opts & PF_OPT_VERBOSE) && pfctl_show_anchors(dev, opts, sub)) + if (walkf(opts, &pr, warg)) + return (-1); + + if (pr.path[0]) + snprintf(sub, sizeof(sub), "%s/%s", pr.path, pr.name); + else + snprintf(sub, sizeof(sub), "%s", pr.name); + if (pfctl_walk_anchors(dev, opts, sub, walkf, warg)) return (-1); } return (0); } int +pfctl_show_anchors(int dev, int opts, char *anchor) +{ + return ( + pfctl_walk_anchors(dev, opts, anchor, pfctl_walk_show, NULL)); +} + +struct pfr_anchors * +pfctl_get_anchors(int dev, const char *anchor, int opts) +{ + struct pfioc_ruleset pr; + static struct pfr_anchors anchors; + char anchorbuf[PATH_MAX]; + char *n; + + SLIST_INIT(&anchors); + + memset(&pr, 0, sizeof(pr)); + if (*anchor != '\0') { + strlcpy(anchorbuf, anchor, sizeof(anchorbuf)); + n = dirname(anchorbuf); + if (n[0] != '.' && n[1] != '\0') + strlcpy(pr.path, n, sizeof(pr.path)); + strlcpy(anchorbuf, anchor, sizeof(anchorbuf)); + n = basename(anchorbuf); + if (n != NULL) + strlcpy(pr.name, n, sizeof(pr.name)); + } + + /* insert a root anchor first. */ + pfctl_walk_get(opts, &pr, &anchors); + + if (pfctl_walk_anchors(dev, opts, anchor, pfctl_walk_get, &anchors)) + errx(1, "%s failed to retrieve list of anchors, can't continue", + __func__); + + return (&anchors); +} + +int +pfctl_call_cleartables(int dev, int opts, struct pfr_anchoritem *pfra) +{ + /* + * PF_OPT_QUIET makes pfctl_clear_tables() to stop printing number of + * tables cleared for given anchor. + */ + opts |= PF_OPT_QUIET; + return ((pfctl_do_clear_tables(pfra->pfra_anchorname, opts) == -1) ? + 1 : 0); +} + +int +pfctl_call_clearrules(int dev, int opts, struct pfr_anchoritem *pfra) +{ + /* + * PF_OPT_QUIET makes pfctl_clear_rules() to stop printing a 'rules + * cleared' message for every anchor it deletes. + */ + opts |= PF_OPT_QUIET; + return (pfctl_flush_rules(dev, opts, pfra->pfra_anchorname)); +} + +int +pfctl_call_clearanchors(int dev, int opts, struct pfr_anchoritem *pfra) +{ + int rv = 0; + + rv |= pfctl_call_cleartables(dev, opts, pfra); + rv |= pfctl_call_clearrules(dev, opts, pfra); + + return (rv); +} + +int +pfctl_recurse(int dev, int opts, const char *anchorname, + int(*walkf)(int, int, struct pfr_anchoritem *)) +{ + int rv = 0; + struct pfr_anchors *anchors; + struct pfr_anchoritem *pfra, *pfra_save; + + anchors = pfctl_get_anchors(dev, anchorname, opts); + /* + * While traversing the list, pfctl_clear_*() must always return + * so that failures on one anchor do not prevent clearing others. + */ + opts |= PF_OPT_IGNFAIL; + printf("Removing:\n"); + SLIST_FOREACH_SAFE(pfra, anchors, pfra_sle, pfra_save) { + printf(" %s\n", + (*pfra->pfra_anchorname == '\0') ? "/" : + pfra->pfra_anchorname); + rv |= walkf(dev, opts, pfra); + SLIST_REMOVE(anchors, pfra, pfr_anchoritem, pfra_sle); + free(pfra->pfra_anchorname); + free(pfra); + } + + return (rv); +} + +int pfctl_show_eth_anchors(int dev, int opts, char *anchorname) { struct pfctl_eth_rulesets_info ri; @@ -2990,7 +3171,6 @@ pfctl_reset(int dev, int opts) int main(int argc, char *argv[]) { - int error = 0; int ch; int mode = O_RDONLY; int opts = 0; @@ -3218,7 +3398,7 @@ main(int argc, char *argv[]) if (opts & PF_OPT_DISABLE) if (pfctl_disable(dev, opts)) - error = 1; + exit_val = 1; if ((path = calloc(1, MAXPATHLEN)) == NULL) errx(1, "%s: calloc", __func__); @@ -3259,7 +3439,7 @@ main(int argc, char *argv[]) pfctl_show_status(dev, opts); break; case 'R': - error = pfctl_show_running(dev); + exit_val = pfctl_show_running(dev); break; case 't': pfctl_show_timeouts(dev, opts); @@ -3321,7 +3501,11 @@ main(int argc, char *argv[]) pfctl_flush_eth_rules(dev, opts, anchorname); break; case 'r': - pfctl_flush_rules(dev, opts, anchorname); + if (opts & PF_OPT_RECURSE) + pfctl_recurse(dev, opts, anchorname, + pfctl_call_clearrules); + else + pfctl_flush_rules(dev, opts, anchorname); break; case 'n': pfctl_flush_nat(dev, opts, anchorname); @@ -3347,7 +3531,13 @@ main(int argc, char *argv[]) pfctl_flush_eth_rules(dev, opts, anchorname); pfctl_flush_rules(dev, opts, anchorname); pfctl_flush_nat(dev, opts, anchorname); - pfctl_do_clear_tables(anchorname, opts); + if (opts & PF_OPT_RECURSE) + pfctl_recurse(dev, opts, anchorname, + pfctl_call_clearanchors); + else { + pfctl_do_clear_tables(anchorname, opts); + pfctl_flush_rules(dev, opts, anchorname); + } if (!*anchorname) { pfctl_clear_altq(dev, opts); pfctl_clear_iface_states(dev, ifaceopt, opts); @@ -3361,7 +3551,11 @@ main(int argc, char *argv[]) pfctl_clear_fingerprints(dev, opts); break; case 'T': - pfctl_do_clear_tables(anchorname, opts); + if ((opts & PF_OPT_RECURSE) == 0) + pfctl_do_clear_tables(anchorname, opts); + else + pfctl_recurse(dev, opts, anchorname, + pfctl_call_cleartables); break; case 'R': pfctl_reset(dev, opts); @@ -3385,7 +3579,7 @@ main(int argc, char *argv[]) pfctl_kill_src_nodes(dev, opts); if (tblcmdopt != NULL) { - error = pfctl_table(argc, argv, tableopt, + exit_val = pfctl_table(argc, argv, tableopt, tblcmdopt, rulesopt, anchorname, opts); rulesopt = NULL; } @@ -3411,17 +3605,17 @@ main(int argc, char *argv[]) if (rulesopt != NULL && !(opts & PF_OPT_MERGE) && !anchorname[0] && (loadopt & PFCTL_FLAG_OPTION)) if (pfctl_file_fingerprints(dev, opts, PF_OSFP_FILE)) - error = 1; + exit_val = 1; if (rulesopt != NULL) { if (pfctl_rules(dev, rulesopt, opts, optimize, anchorname, NULL)) - error = 1; + exit_val = 1; } if (opts & PF_OPT_ENABLE) if (pfctl_enable(dev, opts)) - error = 1; + exit_val = 1; if (debugopt != NULL) { switch (*debugopt) { @@ -3440,5 +3634,19 @@ main(int argc, char *argv[]) } } - exit(error); + exit(exit_val); +} + +char * +pf_strerror(int errnum) +{ + switch (errnum) { + case ESRCH: + return "Table does not exist."; + case EINVAL: + case ENOENT: + return "Anchor does not exist."; + default: + return strerror(errnum); + } } diff --git a/sbin/pfctl/pfctl.h b/sbin/pfctl/pfctl.h index d8196c129187..afecc78086e0 100644 --- a/sbin/pfctl/pfctl.h +++ b/sbin/pfctl/pfctl.h @@ -55,6 +55,13 @@ struct pfr_buffer { (var) != NULL; \ (var) = pfr_buf_next((buf), (var))) +struct pfr_anchoritem { + SLIST_ENTRY(pfr_anchoritem) pfra_sle; + char *pfra_anchorname; +}; + +SLIST_HEAD(pfr_anchors, pfr_anchoritem); + int pfr_get_fd(void); int pfr_add_table(struct pfr_table *, int *, int); int pfr_del_table(struct pfr_table *, int *, int); @@ -76,12 +83,12 @@ void *pfr_buf_next(struct pfr_buffer *, const void *); int pfr_buf_grow(struct pfr_buffer *, int); int pfr_buf_load(struct pfr_buffer *, char *, int, int (*)(struct pfr_buffer *, char *, int, int), int); -char *pfr_strerror(int); +char *pf_strerror(int); int pfi_get_ifaces(const char *, struct pfi_kif *, int *); int pfi_clr_istats(const char *, int *, int); void pfctl_print_title(char *); -void pfctl_do_clear_tables(const char *, int); +int pfctl_do_clear_tables(const char *, int); void pfctl_show_tables(const char *, int); int pfctl_table(int, char *[], char *, const char *, char *, const char *, int); @@ -150,4 +157,7 @@ void expand_label(char *, size_t, struct pfctl_rule *); const char *pfctl_proto2name(int); +void pfctl_err(int, int, const char *, ...); +void pfctl_errx(int, int, const char *, ...); + #endif /* _PFCTL_H_ */ diff --git a/sbin/pfctl/pfctl_optimize.c b/sbin/pfctl/pfctl_optimize.c index b58bace326c2..1d2a60555f19 100644 --- a/sbin/pfctl/pfctl_optimize.c +++ b/sbin/pfctl/pfctl_optimize.c @@ -273,7 +273,10 @@ pfctl_optimize_ruleset(struct pfctl *pf, struct pfctl_ruleset *rs) struct pfctl_rule *r; struct pfctl_rulequeue *old_rules; - DEBUG("optimizing ruleset"); + if (TAILQ_EMPTY(rs->rules[PF_RULESET_FILTER].active.ptr)) + return (0); + + DEBUG("optimizing ruleset \"%s\"", rs->anchor->path); memset(&table_buffer, 0, sizeof(table_buffer)); skip_init(); TAILQ_INIT(&opt_queue); @@ -720,11 +723,7 @@ reorder_rules(struct pfctl *pf, struct superblock *block, int depth) * it based on a more optimal skipstep order. */ TAILQ_INIT(&head); - while ((por = TAILQ_FIRST(&block->sb_rules))) { - TAILQ_REMOVE(&block->sb_rules, por, por_entry); - TAILQ_INSERT_TAIL(&head, por, por_entry); - } - + TAILQ_CONCAT(&head, &block->sb_rules, por_entry); while (!TAILQ_EMPTY(&head)) { largest = 1; @@ -745,11 +744,7 @@ reorder_rules(struct pfctl *pf, struct superblock *block, int depth) * Nothing useful left. Leave remaining rules in order. */ DEBUG("(%d) no more commonality for skip steps", depth); - while ((por = TAILQ_FIRST(&head))) { - TAILQ_REMOVE(&head, por, por_entry); - TAILQ_INSERT_TAIL(&block->sb_rules, por, - por_entry); - } + TAILQ_CONCAT(&block->sb_rules, &head, por_entry); } else { /* * There is commonality. Extract those common rules @@ -860,10 +855,7 @@ block_feedback(struct pfctl *pf, struct superblock *block) */ TAILQ_INIT(&queue); - while ((por1 = TAILQ_FIRST(&block->sb_rules)) != NULL) { - TAILQ_REMOVE(&block->sb_rules, por1, por_entry); - TAILQ_INSERT_TAIL(&queue, por1, por_entry); - } + TAILQ_CONCAT(&queue, &block->sb_rules, por_entry); while ((por1 = TAILQ_FIRST(&queue)) != NULL) { TAILQ_REMOVE(&queue, por1, por_entry); @@ -900,13 +892,13 @@ load_feedback_profile(struct pfctl *pf, struct superblocks *superblocks) struct pf_opt_queue queue; struct pfctl_rules_info rules; struct pfctl_rule a, b, rule; - int nr, mnr; + int nr, mnr, ret; TAILQ_INIT(&queue); TAILQ_INIT(&prof_superblocks); - if (pfctl_get_rules_info_h(pf->h, &rules, PF_PASS, "")) { - warn("DIOCGETRULES"); + if ((ret = pfctl_get_rules_info_h(pf->h, &rules, PF_PASS, "")) != 0) { + warnx("%s", pf_strerror(ret)); return (1); } mnr = rules.nr; @@ -921,7 +913,7 @@ load_feedback_profile(struct pfctl *pf, struct superblocks *superblocks) if (pfctl_get_rule_h(pf->h, nr, rules.ticket, "", PF_PASS, &rule, anchor_call)) { - warn("DIOCGETRULENV"); + warnx("%s", pf_strerror(ret)); free(por); return (1); } @@ -1256,7 +1248,7 @@ add_opt_table(struct pfctl *pf, struct pf_opt_tbl **tbl, sa_family_t af, /* This is just a temporary table name */ snprintf((*tbl)->pt_name, sizeof((*tbl)->pt_name), "%s%d", - PF_OPT_TABLE_PREFIX, tablenum++); + PF_OPTIMIZER_TABLE_PFX, tablenum++); DEBUG("creating table <%s>", (*tbl)->pt_name); } @@ -1323,9 +1315,9 @@ pf_opt_create_table(struct pfctl *pf, struct pf_opt_tbl *tbl) /* Now we have to pick a table name that isn't used */ again: DEBUG("translating temporary table <%s> to <%s%x_%d>", tbl->pt_name, - PF_OPT_TABLE_PREFIX, table_identifier, tablenum); + PF_OPTIMIZER_TABLE_PFX, table_identifier, tablenum); snprintf(tbl->pt_name, sizeof(tbl->pt_name), "%s%x_%d", - PF_OPT_TABLE_PREFIX, table_identifier, tablenum); + PF_OPTIMIZER_TABLE_PFX, table_identifier, tablenum); PFRB_FOREACH(t, &table_buffer) { if (strcasecmp(t->pfrt_name, tbl->pt_name) == 0) { /* Collision. Try again */ diff --git a/sbin/pfctl/pfctl_osfp.c b/sbin/pfctl/pfctl_osfp.c index 3a94c2e8c81b..5770c8343a46 100644 --- a/sbin/pfctl/pfctl_osfp.c +++ b/sbin/pfctl/pfctl_osfp.c @@ -264,7 +264,7 @@ void pfctl_clear_fingerprints(int dev, int opts) { if (ioctl(dev, DIOCOSFPFLUSH)) - err(1, "DIOCOSFPFLUSH"); + pfctl_err(opts, 1, "DIOCOSFPFLUSH"); } /* flush pfctl's view of the fingerprints */ diff --git a/sbin/pfctl/pfctl_parser.c b/sbin/pfctl/pfctl_parser.c index 26a213c3ffd9..f2eb75135609 100644 --- a/sbin/pfctl/pfctl_parser.c +++ b/sbin/pfctl/pfctl_parser.c @@ -68,7 +68,7 @@ void print_op (u_int8_t, const char *, const char *); void print_port (u_int8_t, u_int16_t, u_int16_t, const char *, int); -void print_ugid (u_int8_t, unsigned, unsigned, const char *, unsigned); +void print_ugid (u_int8_t, id_t, id_t, const char *); void print_flags (uint16_t); void print_fromto(struct pf_rule_addr *, pf_osfp_t, struct pf_rule_addr *, sa_family_t, u_int8_t, int, int); @@ -364,14 +364,14 @@ print_port(u_int8_t op, u_int16_t p1, u_int16_t p2, const char *proto, int numer } void -print_ugid(u_int8_t op, unsigned u1, unsigned u2, const char *t, unsigned umax) +print_ugid(u_int8_t op, id_t i1, id_t i2, const char *t) { char a1[11], a2[11]; - snprintf(a1, sizeof(a1), "%u", u1); - snprintf(a2, sizeof(a2), "%u", u2); + snprintf(a1, sizeof(a1), "%ju", (uintmax_t)i1); + snprintf(a2, sizeof(a2), "%ju", (uintmax_t)i2); printf(" %s", t); - if (u1 == umax && (op == PF_OP_EQ || op == PF_OP_NE)) + if (i1 == -1 && (op == PF_OP_EQ || op == PF_OP_NE)) print_op(op, "unknown", a2); else print_op(op, a1, a2); @@ -928,7 +928,7 @@ print_rule(struct pfctl_rule *r, const char *anchor_call, int verbose, int numer printf("%sall", count++ ? ", " : ""); if (r->log & PF_LOG_MATCHES) printf("%smatches", count++ ? ", " : ""); - if (r->log & PF_LOG_SOCKET_LOOKUP) + if (r->log & PF_LOG_USER) printf("%suser", count++ ? ", " : ""); if (r->logif) printf("%sto pflog%u", count++ ? ", " : "", @@ -977,11 +977,9 @@ print_rule(struct pfctl_rule *r, const char *anchor_call, int verbose, int numer printf(" %sreceived-on %s", r->rcvifnot ? "!" : "", r->rcv_ifname); if (r->uid.op) - print_ugid(r->uid.op, r->uid.uid[0], r->uid.uid[1], "user", - UID_MAX); + print_ugid(r->uid.op, r->uid.uid[0], r->uid.uid[1], "user"); if (r->gid.op) - print_ugid(r->gid.op, r->gid.gid[0], r->gid.gid[1], "group", - GID_MAX); + print_ugid(r->gid.op, r->gid.gid[0], r->gid.gid[1], "group"); if (r->flags || r->flagset) { printf(" flags "); print_flags(r->flags); @@ -1485,7 +1483,8 @@ ifa_load(void) err(1, "getifaddrs"); for (ifa = ifap; ifa; ifa = ifa->ifa_next) { - if (!(ifa->ifa_addr->sa_family == AF_INET || + if (ifa->ifa_addr == NULL || + !(ifa->ifa_addr->sa_family == AF_INET || ifa->ifa_addr->sa_family == AF_INET6 || ifa->ifa_addr->sa_family == AF_LINK)) continue; diff --git a/sbin/pfctl/pfctl_parser.h b/sbin/pfctl/pfctl_parser.h index b91d37c791ae..7a3c0c2a523f 100644 --- a/sbin/pfctl/pfctl_parser.h +++ b/sbin/pfctl/pfctl_parser.h @@ -55,6 +55,7 @@ #define PF_OPT_RECURSE 0x04000 #define PF_OPT_KILLMATCH 0x08000 #define PF_OPT_NODNS 0x10000 +#define PF_OPT_IGNFAIL 0x20000 #define PF_NAT_PROXY_PORT_LOW 50001 #define PF_NAT_PROXY_PORT_HIGH 65535 @@ -262,7 +263,6 @@ struct pf_opt_tbl { struct node_tinithead pt_nodes; struct pfr_buffer *pt_buf; }; -#define PF_OPT_TABLE_PREFIX "__automatic_" /* optimizer pf_rule container */ struct pf_opt_rule { diff --git a/sbin/pfctl/pfctl_radix.c b/sbin/pfctl/pfctl_radix.c index 21191259adff..00e4207d377b 100644 --- a/sbin/pfctl/pfctl_radix.c +++ b/sbin/pfctl/pfctl_radix.c @@ -461,16 +461,3 @@ pfr_next_token(char buf[BUF_SIZE], FILE *fp) buf[i] = '\0'; return (1); } - -char * -pfr_strerror(int errnum) -{ - switch (errnum) { - case ESRCH: - return "Table does not exist"; - case ENOENT: - return "Anchor or Ruleset does not exist"; - default: - return strerror(errnum); - } -} diff --git a/sbin/pfctl/pfctl_table.c b/sbin/pfctl/pfctl_table.c index 0842b042df41..f583f5ef8e79 100644 --- a/sbin/pfctl/pfctl_table.c +++ b/sbin/pfctl/pfctl_table.c @@ -61,7 +61,6 @@ static int load_addr(struct pfr_buffer *, int, char *[], char *, int, int); static void print_addrx(struct pfr_addr *, struct pfr_addr *, int); static int nonzero_astats(struct pfr_astats *); static void print_astats(struct pfr_astats *, int); -static void radix_perror(void); static void xprintf(int, const char *, ...); static void print_iface(struct pfi_kif *, int); @@ -75,13 +74,14 @@ static const char *istats_text[2][2][2] = { { { "In6/Pass:", "In6/Block:" }, { "Out6/Pass:", "Out6/Block:" } } }; -#define RVTEST(fct) do { \ - if ((!(opts & PF_OPT_NOACTION) || \ - (opts & PF_OPT_DUMMYACTION)) && \ - (fct)) { \ - radix_perror(); \ - goto _error; \ - } \ +#define RVTEST(fct) do { \ + if ((!(opts & PF_OPT_NOACTION) || \ + (opts & PF_OPT_DUMMYACTION)) && \ + (fct)) { \ + if ((opts & PF_OPT_RECURSE) == 0) \ + warnx("%s", pf_strerror(errno)); \ + goto _error; \ + } \ } while (0) #define CREATE_TABLE do { \ @@ -92,7 +92,7 @@ static const char *istats_text[2][2][2] = { (opts & PF_OPT_DUMMYACTION)) && \ (pfr_add_table(&table, &nadd, flags)) && \ (errno != EPERM)) { \ - radix_perror(); \ + warnx("%s", pf_strerror(errno)); \ goto _error; \ } \ if (nadd) { \ @@ -103,11 +103,17 @@ static const char *istats_text[2][2][2] = { table.pfrt_flags &= ~PFR_TFLAG_PERSIST; \ } while(0) -void +int pfctl_do_clear_tables(const char *anchor, int opts) { - if (pfctl_table(0, NULL, NULL, "-F", NULL, anchor, opts)) - exit(1); + int rv; + + if ((rv = pfctl_table(0, NULL, NULL, "-F", NULL, anchor, opts)) == -1) { + if ((opts & PF_OPT_IGNFAIL) == 0) + exit(1); + } + + return (rv); } void @@ -552,13 +558,6 @@ print_astats(struct pfr_astats *as, int dns) (unsigned long long)as->pfras_bytes[dir][op]); } -void -radix_perror(void) -{ - extern char *__progname; - fprintf(stderr, "%s: %s.\n", __progname, pfr_strerror(errno)); -} - int pfctl_define_table(char *name, int flags, int addrs, const char *anchor, struct pfr_buffer *ab, u_int32_t ticket) @@ -640,10 +639,8 @@ pfctl_show_ifaces(const char *filter, int opts) for (;;) { pfr_buf_grow(&b, b.pfrb_size); b.pfrb_size = b.pfrb_msize; - if (pfi_get_ifaces(filter, b.pfrb_caddr, &b.pfrb_size)) { - radix_perror(); - exit(1); - } + if (pfi_get_ifaces(filter, b.pfrb_caddr, &b.pfrb_size)) + errx(1, "%s", pf_strerror(errno)); if (b.pfrb_size <= b.pfrb_msize) break; } diff --git a/sbin/pfctl/tests/files/pf0088.in b/sbin/pfctl/tests/files/pf0088.in index 4700b6916b7e..a85aa84a30bb 100644 --- a/sbin/pfctl/tests/files/pf0088.in +++ b/sbin/pfctl/tests/files/pf0088.in @@ -16,7 +16,7 @@ pass to 10.0.0.2 keep state block from 10.0.0.3 to 10.0.0.2 pass to 10.0.0.2 modulate state block from 10.0.0.3 to 10.0.0.2 -pass to 10.0.0.2 synproxy state +pass in to 10.0.0.2 synproxy state pass out proto tcp from 10.0.0.4 to 10.0.0.5 keep state diff --git a/sbin/pfctl/tests/files/pf0088.ok b/sbin/pfctl/tests/files/pf0088.ok index 47251a4503dd..801056a4ab46 100644 --- a/sbin/pfctl/tests/files/pf0088.ok +++ b/sbin/pfctl/tests/files/pf0088.ok @@ -11,7 +11,7 @@ pass inet from any to 10.0.0.2 flags S/SA keep state block drop inet from 10.0.0.3 to 10.0.0.2 pass inet from any to 10.0.0.2 flags S/SA modulate state block drop inet from 10.0.0.3 to 10.0.0.2 -pass inet from any to 10.0.0.2 flags S/SA synproxy state +pass in inet from any to 10.0.0.2 flags S/SA synproxy state pass out inet proto tcp from 10.0.0.4 to 10.0.0.5 flags S/SA keep state pass out inet proto tcp from 10.0.0.4 to 10.0.0.5 port = http flags S/SA keep state pass out all flags S/SA keep state diff --git a/sbin/pfctl/tests/files/pf1072.fail b/sbin/pfctl/tests/files/pf1072.fail new file mode 100644 index 000000000000..06ef5ae457e5 --- /dev/null +++ b/sbin/pfctl/tests/files/pf1072.fail @@ -0,0 +1 @@ +invalid port range diff --git a/sbin/pfctl/tests/files/pf1072.in b/sbin/pfctl/tests/files/pf1072.in new file mode 100644 index 000000000000..e09e92388ce1 --- /dev/null +++ b/sbin/pfctl/tests/files/pf1072.in @@ -0,0 +1 @@ +pass in proto tcp from any port 500:100 to any diff --git a/sbin/pfctl/tests/macro.sh b/sbin/pfctl/tests/macro.sh index 9c48dbbc69f0..071c6cb4f426 100755 --- a/sbin/pfctl/tests/macro.sh +++ b/sbin/pfctl/tests/macro.sh @@ -3,6 +3,7 @@ atf_test_case "space" cleanup space_head() { atf_set descr "Test macros with spaces" + atf_set require.kmods "pf" } space_body() diff --git a/sbin/pfctl/tests/pfctl_test.c b/sbin/pfctl/tests/pfctl_test.c index dbdcaa4900ea..5f0aa7826bb4 100644 --- a/sbin/pfctl/tests/pfctl_test.c +++ b/sbin/pfctl/tests/pfctl_test.c @@ -65,24 +65,6 @@ * Copied from OpenBSD. */ -static bool -check_pf_module_available(void) -{ - int modid; - struct module_stat stat; - - if ((modid = modfind("pf")) < 0) { - warn("pf module not found"); - return false; - } - stat.version = sizeof(struct module_stat); - if (modstat(modid, &stat) < 0) { - warn("can't stat pf module id %d", modid); - return false; - } - return (true); -} - extern char **environ; static struct sbuf * @@ -185,9 +167,6 @@ run_pfctl_test(const char *input_path, const char *output_path, struct sbuf *expected_output; struct sbuf *real_output; - if (!check_pf_module_available()) - atf_tc_skip("pf(4) is not loaded"); - /* The test inputs need to be able to use relative includes. */ snprintf(input_files_path, sizeof(input_files_path), "%s/files", atf_tc_get_config_var(tc, "srcdir")); @@ -292,6 +271,7 @@ do_selfpf_test(const char *number, const atf_tc_t *tc) ATF_TC_HEAD(pf##number, tc) \ { \ atf_tc_set_md_var(tc, "descr", descr); \ + atf_tc_set_md_var(tc, "require.kmods", "pf"); \ } \ ATF_TC_BODY(pf##number, tc) \ { \ @@ -301,6 +281,7 @@ do_selfpf_test(const char *number, const atf_tc_t *tc) ATF_TC_HEAD(selfpf##number, tc) \ { \ atf_tc_set_md_var(tc, "descr", "Self " descr); \ + atf_tc_set_md_var(tc, "require.kmods", "pf"); \ } \ ATF_TC_BODY(selfpf##number, tc) \ { \ @@ -312,6 +293,7 @@ do_selfpf_test(const char *number, const atf_tc_t *tc) ATF_TC_HEAD(pf##number, tc) \ { \ atf_tc_set_md_var(tc, "descr", descr); \ + atf_tc_set_md_var(tc, "require.kmods", "pf"); \ } \ ATF_TC_BODY(pf##number, tc) \ { \ @@ -325,6 +307,7 @@ do_selfpf_test(const char *number, const atf_tc_t *tc) atf_tc_set_md_var(tc, "descr", descr); \ atf_tc_set_md_var(tc, "execenv", "jail"); \ atf_tc_set_md_var(tc, "execenv.jail.params", "vnet"); \ + atf_tc_set_md_var(tc, "require.kmods", "pf"); \ } \ ATF_TC_BODY(pf##number, tc) \ { \ diff --git a/sbin/pfctl/tests/pfctl_test_list.inc b/sbin/pfctl/tests/pfctl_test_list.inc index 51729bc9adad..3a68cc06ec74 100644 --- a/sbin/pfctl/tests/pfctl_test_list.inc +++ b/sbin/pfctl/tests/pfctl_test_list.inc @@ -180,3 +180,4 @@ PFCTL_TEST(1068, "max-pkt-rate") PFCTL_TEST(1069, "max-pkt-size") PFCTL_TEST_FAIL(1070, "include line number") PFCTL_TEST(1071, "mask length on (lo0)") +PFCTL_TEST_FAIL(1072, "Invalid port range") diff --git a/sbin/ping/Makefile b/sbin/ping/Makefile index b4e3f115b245..30c68cbaba52 100644 --- a/sbin/ping/Makefile +++ b/sbin/ping/Makefile @@ -32,8 +32,6 @@ CFLAGS+=-DWITH_CASPER CFLAGS+=-DIPSEC LIBADD+= ipsec -CFLAGS+= -Wno-error=unused-but-set-variable - HAS_TESTS= SUBDIR.${MK_TESTS}+= tests diff --git a/sbin/reboot/reboot.8 b/sbin/reboot/reboot.8 index 0ddcee643244..1bbc39d52be4 100644 --- a/sbin/reboot/reboot.8 +++ b/sbin/reboot/reboot.8 @@ -110,6 +110,15 @@ Care should be taken if .Va value contains any characters that are special to the shell or loader's configuration parsing code. +.It Fl f +Force reboot. +Normally, +.Nm +checks for the presence of the next kernel, +and absence of the +.Pa /var/run/noshutdown +file. +Without this flag, reboot is denied if one of the conditions failed. .It Fl k Ar kname Boot the specified kernel .Ar kname diff --git a/sbin/reboot/reboot.c b/sbin/reboot/reboot.c index 9825d4f96319..f6065e80fb66 100644 --- a/sbin/reboot/reboot.c +++ b/sbin/reboot/reboot.c @@ -40,6 +40,7 @@ #include <err.h> #include <errno.h> #include <fcntl.h> +#include <paths.h> #include <pwd.h> #include <signal.h> #include <spawn.h> @@ -222,6 +223,7 @@ main(int argc, char *argv[]) { struct utmpx utx; const struct passwd *pw; + struct stat st; int ch, howto = 0, i, sverrno; bool Dflag, fflag, lflag, Nflag, nflag, qflag; uint64_t pageins; @@ -294,6 +296,11 @@ main(int argc, char *argv[]) if (argc != 0) usage(); + if (!donextboot && !fflag && stat(_PATH_NOSHUTDOWN, &st) == 0) { + errx(1, "Reboot cannot be done, " _PATH_NOSHUTDOWN + " is present"); + } + if (Dflag && ((howto & ~RB_HALT) != 0 || kernel != NULL)) errx(1, "cannot delete existing nextboot config and do anything else"); if ((howto & (RB_DUMP | RB_HALT)) == (RB_DUMP | RB_HALT)) diff --git a/sbin/recoverdisk/recoverdisk.1 b/sbin/recoverdisk/recoverdisk.1 index 2999ac6ec409..9f1deb4c0c23 100644 --- a/sbin/recoverdisk/recoverdisk.1 +++ b/sbin/recoverdisk/recoverdisk.1 @@ -27,7 +27,7 @@ .Os .Sh NAME .Nm recoverdisk -.Nd recover data from hard disk or optical media +.Nd recover data from disk-like devices. .Sh SYNOPSIS .Nm .Op Fl b Ar bigsize @@ -41,79 +41,101 @@ .Sh DESCRIPTION The .Nm -utility reads data from the +utility reads all data from the .Ar source -file until all blocks could be successfully read. +and retries read operations until they succeed. If .Ar destination -was specified all data is being written to that file. -It starts reading in multiples of the sector size. -Whenever a block fails, it is put to the end of the working queue and will be -read again, possibly with a smaller read size. +is specified all data read be written there. .Pp -By default it uses block sizes of roughly 1 MB, 32kB, and the native -sector size (usually 512 bytes). -These figures are adjusted slightly, for devices whose sectorsize is not a -power of 2, e.g., audio CDs with a sector size of 2352 bytes. +The internal work-list can be saved and loaded so that +.Nm +sessions can be resumed, for instance when a marginal +source hard-disk shuts down. +.Pp +The work-list is initialized with a single item which covers the entire +.Ar source +and +.Nm +always chips away at the first item on the work-list. + +When a read succeeds, that part of the current chunk is eliminated +from the work-list. + +When a read fails, that part of the item is appended to the worklist +as a separate item, and will be retried in due order. +If +.Ar destination +is specified, the corresponding range is filled with '_UNREAD_'. +.Pp +The first pass attempts to read everything in "big-size" chunks, +the second pass reads in "medium-size" chunks and third and subsequent +passes read in "small-size" chunks. This three stage process is +an attempt to optimize the case where only a few bad blocks exist +on +.Ar source . +If too many read-errors are encountered, +.Nm +will fall back to smaller sizes sooner. +.Pp +The three sizes default to 128kB (or less if the sector size does +not divide 128kB cleanly, for instance audio CD media), and the +reported +.Dv DIOCGSTRIPESIZE +and +.Dv DIOCGSECTORSIZE +respectively. .Pp The options are as follows: .Bl -tag -width indent .It Fl b Ar bigsize -The size of reads attempted first. -The middle pass is roughly the logarithmic average of the bigsize and -the sectorsize. -.It Fl r Ar readlist -Read the list of blocks and block sizes to read from the specified file. -.It Fl s Ar interval -How often we should update the writelist file while things go OK. -The default is 60 and the unit is "progress messages" so if things -go well, this is the same as once per minute. +The size of reads attempted in first pass. +.It Fl m Ar mediumsize +The size of reads attempted in second pass. +.It Fl s Ar smallsize +The size of reads attempted in third and subsequent passes. +.It Fl r Ar work-list-file +Read the work-list from a file. +.It Fl w Ar work-list-file +Write the work-list to a file when a read succeed, but at most once +every minute. +.It Fl l Ar log-file +Each successful read is logged with timestamp, offset and length. +.It Fl t Ar totalsize +How many bytes should be recovered. The default is what +.Dv DIOCGMEDIASIZE +reports for character and block devices or +.Dv st_size +if +.Ar source +is a regular file. +.It Fl p Ar pause +.Xr sleep 3 +this long whenever a read fails. This makes the +.Ar source +device look less sick to the operating system. .It Fl u Ar pattern -By default blocks which encounter read errors will be filled with -the pattern +By default blocks which cannot be read are filled with the pattern .Ql _UNREAD_ -in the output file. -This option can be -used to specify another pattern. -Nothing gets written if the string is empty. +in the output file. This option can be used to specify a different +pattern. If the pattern is the empty string, nothing is written. .It Fl v -Enables nicer status report using ANSI escapes and UTF-8. -.It Fl w Ar writelist -Write the list of remaining blocks to read to the specified file if -.Nm -is aborted via -.Dv SIGINT . +Produce a detailed progress report with ANSI escapes and UTF-8. .El .Pp -The -.Fl r -and -.Fl w -options can be specified together. -Especially, they can point to the same file, which will be updated on abort. -.Sh OUTPUT -The .Nm -utility -prints several columns, detailing the progress -.Bl -tag -width remaining -.It Va start -Starting offset of the current block. -.It Va size -Read size of the current block. -.It Va len -Length of the current block. -.It Va state -Is increased for every failed read. -.It Va done -Number of bytes already read. -.It Va remaining -Number of bytes remaining. -.It Va "% done" -Percent complete. -.El +can be aborted with +.Dv SIGINT , +but with a sick +.Ar source +it may take up to several minutes before the current read operation +returns from the kernel. +.Pp .Sh EXAMPLES .Bd -literal +# check if all sectors can be read on a USB stick: +recoverdisk /dev/da0 + # recover data from failing hard drive ada3 recoverdisk /dev/ada3 /data/disk.img @@ -129,10 +151,72 @@ recoverdisk -r worklist -w worklist /dev/cd0 /data/cd.iso # recover a single file from the unreadable media recoverdisk /cdrom/file.avi file.avi -# If the disk hangs the system on read-errors try: -recoverdisk -b 0 /dev/ada3 /somewhere - .Ed +.Sh PRACTICAL ADVICE +In Datamuseum.dk +.Nm +has been used to recover all sorts of data-media for two decades, +here are some things we have learned: +.Bl -bullet +.It +Interacting with failing hardware has a tendency to crash machines, +so it is always a good idea to use the +.Fl -w work-list-file +so that it is possible to continue. +.It +When attempting to recover hard to read data from failing hard disks, +it pays to pamper the drive as much as possible: +.It +It is generally best to keep the drive in it's usual physical orientation, +but it can also help to try other orientations. +.It +Insulate the drive from external vibrations. +.It +Keep the drive cool with a fan. +.It +If possible, power the drive from a laboratory power supply. +.It +Do not loose patience: Let +.Nm +run as long as possible. +.It +(S)ATA controllers do not handle failing disks well, if this +is a problem, use a USB-(S)ATA adapter instead. +.It +The +.Nm +source code is deliberately written to be easily portable to +older versions of +.Fx +and to other operating systems. +.It +If you need to read ST-506, RLL or ESDI drives +.Fx 3.5.1 +is a good compromise. +.It +Sometimes forcing the disk to step between reads helps. +Since +.Nm +process the work-list in the order it is read, this +can be accomplished by sorting the work-list with +something like: +.Dl % sort +0.5 +.It +By default the +.Xr CAM +layer will retry failing read operations, but that +will get stuck on the bad sectors for long time +and delay recovering what actually can be read from +a rapidly failing drive. +In that situation, set the appropriate +.Dl kern.cam.*.retry_count +sysctl to zero. +.It +For floppies and un-zoned hard disks (ST-506 to +early IDE) set +.Fl b Ar bigsize +to the size of a track. +.El .Sh SEE ALSO .Xr dd 1 , .Xr ada 4 , @@ -143,7 +227,8 @@ recoverdisk -b 0 /dev/ada3 /somewhere The .Nm utility first appeared in -.Fx 7.0 . +.Fx 7.0 +because Somebodyâ„¢ forgot to make a backup copy. .Sh AUTHORS .An -nosplit The original implementation was done by @@ -151,34 +236,29 @@ The original implementation was done by with minor improvements from .An Ulrich Sp\(:orlein Aq Mt uqs@FreeBSD.org . .Pp -This manual page was written by +This manual page was originally written by .An Ulrich Sp\(:orlein . .Sh BUGS -Reading from media where the sectorsize is not a power of 2 will make all -1 MB reads fail. -This is due to the DMA reads being split up into blocks of at most 128kB. -These reads then fail if the sectorsize is not a divisor of 128kB. -When reading a full raw audio CD, this leads to roughly 700 error messages -flying by. -This is harmless and can be avoided by setting -.Fl b -to no more than 128kB. +If a failing device causes the machine to crash, there is +a risk that a chunk might have been successfully read +and removed from the work-list, but not yet flushed to +the +.Ar destination . .Pp .Nm -needs to know about read errors as fast as possible, i.e., retries by lower -layers will usually slow down the operation. -When using -.Xr cam 4 -attached drives, you may want to set kern.cam.XX.retry_count to zero, e.g.: -.Bd -literal -# sysctl kern.cam.ada.retry_count=0 -# sysctl kern.cam.cd.retry_count=0 -# sysctl kern.cam.da.retry_count=0 -.Ed -.\".Pp -.\"When reading from optical media, a bug in the GEOM framework will -.\"prevent it from seeing that the media has been removed. -.\"The device can still be opened, but all reads will fail. -.\"This is usually harmless, but will send -.\".Nm -.\"into an infinite loop. +calls +.Xr fdatasync 3 +on the destination before writing the work-list to a +temporary file, and calls it again on the temporary +file before renaming it to the specified +.Fl w Ar work-file-list +filename. +But even then things dont always work out. +.Pp +.Nm +should have an option for reconstructing the work-list +from the +.Ar destination +by enumerating the +.Fl u Ar pattern +filled ranges. diff --git a/sbin/recoverdisk/recoverdisk.c b/sbin/recoverdisk/recoverdisk.c index 446266c36d50..f13a1f211863 100644 --- a/sbin/recoverdisk/recoverdisk.c +++ b/sbin/recoverdisk/recoverdisk.c @@ -8,6 +8,7 @@ * this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp * ---------------------------------------------------------------------------- */ + #include <sys/param.h> #include <sys/queue.h> #include <sys/disk.h> @@ -27,18 +28,10 @@ #include <time.h> #include <unistd.h> -/* Safe printf into a fixed-size buffer */ -#define bprintf(buf, fmt, ...) \ - do { \ - int ibprintf; \ - ibprintf = snprintf(buf, sizeof buf, fmt, __VA_ARGS__); \ - assert(ibprintf >= 0 && ibprintf < (int)sizeof buf); \ - } while (0) - struct lump { - off_t start; - off_t len; - int state; + uint64_t start; + uint64_t len; + unsigned pass; TAILQ_ENTRY(lump) list; }; @@ -46,25 +39,32 @@ struct period { time_t t0; time_t t1; char str[20]; - off_t bytes_read; + uint64_t bytes_read; TAILQ_ENTRY(period) list; }; TAILQ_HEAD(period_head, period); static volatile sig_atomic_t aborting = 0; static int verbose = 0; -static size_t bigsize = 1024 * 1024; -static size_t medsize; -static size_t minsize = 512; -static off_t tot_size; -static off_t done_size; +static uint64_t big_read; +static uint64_t medium_read; +static uint64_t small_read; +static uint64_t total_size; +static uint64_t done_size; static char *input; -static char *wworklist = NULL; -static char *rworklist = NULL; +static char *write_worklist_file = NULL; +static char *read_worklist_file = NULL; static const char *unreadable_pattern = "_UNREAD_"; -static const int write_errors_are_fatal = 1; -static int fdr, fdw; - +static int write_errors_are_fatal = 1; +static int read_fd, write_fd; +static FILE *log_file = NULL; +static char *work_buf; +static char *pattern_buf; +static double error_pause; + +static unsigned nlumps; +static double n_reads, n_good_reads; +static time_t t_first; static TAILQ_HEAD(, lump) lumps = TAILQ_HEAD_INITIALIZER(lumps); static struct period_head minute = TAILQ_HEAD_INITIALIZER(minute); static struct period_head quarter = TAILQ_HEAD_INITIALIZER(quarter); @@ -74,7 +74,8 @@ static struct period_head day = TAILQ_HEAD_INITIALIZER(quarter); /**********************************************************************/ static void -report_good_read2(time_t now, size_t bytes, struct period_head *ph, time_t dt) +account_good_read_period(time_t now, uint64_t bytes, + struct period_head *ph, time_t dt) { struct period *pp; const char *fmt; @@ -82,7 +83,7 @@ report_good_read2(time_t now, size_t bytes, struct period_head *ph, time_t dt) pp = TAILQ_FIRST(ph); if (pp == NULL || pp->t1 < now) { - pp = calloc(1, sizeof(*pp)); + pp = calloc(1UL, sizeof(*pp)); assert(pp != NULL); pp->t0 = (now / dt) * dt; pp->t1 = (now / dt + 1) * dt; @@ -98,13 +99,13 @@ report_good_read2(time_t now, size_t bytes, struct period_head *ph, time_t dt) } static void -report_good_read(time_t now, size_t bytes) +account_good_read(time_t now, uint64_t bytes) { - report_good_read2(now, bytes, &minute, 60L); - report_good_read2(now, bytes, &quarter, 900L); - report_good_read2(now, bytes, &hour, 3600L); - report_good_read2(now, bytes, &day, 86400L); + account_good_read_period(now, bytes, &minute, 60L); + account_good_read_period(now, bytes, &quarter, 900L); + account_good_read_period(now, bytes, &hour, 3600L); + account_good_read_period(now, bytes, &day, 86400L); } static void @@ -114,20 +115,18 @@ report_one_period(const char *period, struct period_head *ph) int n; n = 0; - printf("%s \xe2\x94\x82", period); + printf("%s ", period); TAILQ_FOREACH(pp, ph, list) { - if (n == 3) { + if (++n == 4) { TAILQ_REMOVE(ph, pp, list); free(pp); break; } - if (n++) - printf(" \xe2\x94\x82"); - printf(" %s %14jd", pp->str, pp->bytes_read); + printf("\xe2\x94\x82 %s %14ju ", + pp->str, (uintmax_t)pp->bytes_read); } for (; n < 3; n++) { - printf(" \xe2\x94\x82"); - printf(" %5s %14s", "", ""); + printf("\xe2\x94\x82 %5s %14s ", "", ""); } printf("\x1b[K\n"); } @@ -146,27 +145,23 @@ report_periods(void) static void set_verbose(void) { - struct winsize wsz; - if (!isatty(STDIN_FILENO) || ioctl(STDIN_FILENO, TIOCGWINSZ, &wsz)) - return; verbose = 1; } static void -report_header(int eol) +report_header(const char *term) { - printf("%13s %7s %13s %5s %13s %13s %9s", + printf("%13s %7s %13s %5s %13s %13s %9s%s", "start", "size", "block-len", "pass", "done", "remaining", - "% done"); - if (eol) - printf("\x1b[K"); - putchar('\n'); + "% done", + term + ); } #define REPORTWID 79 @@ -186,20 +181,20 @@ report_hline(const char *how) printf("\x1b[K\n"); } -static off_t hist[REPORTWID]; -static off_t last_done = -1; +static uint64_t hist[REPORTWID]; +static uint64_t prev_done = ~0UL; static void -report_histogram(const struct lump *lp) +report_histogram(uint64_t start) { - off_t j, bucket, fp, fe, k, now; + uint64_t j, bucket, fp, fe, k, now; double a; struct lump *lp2; - bucket = tot_size / REPORTWID; - if (tot_size > bucket * REPORTWID) + bucket = total_size / REPORTWID; + if (total_size > bucket * REPORTWID) bucket += 1; - if (done_size != last_done) { + if (done_size != prev_done) { memset(hist, 0, sizeof hist); TAILQ_FOREACH(lp2, &lumps, list) { fp = lp2->start; @@ -213,9 +208,9 @@ report_histogram(const struct lump *lp) fp += k; } } - last_done = done_size; + prev_done = done_size; } - now = lp->start / bucket; + now = start / bucket; for (j = 0; j < REPORTWID; j++) { a = round(8 * (double)hist[j] / bucket); assert (a >= 0 && a < 9); @@ -228,7 +223,7 @@ report_histogram(const struct lump *lp) } else { putchar(0xe2); putchar(0x96); - putchar(0x80 + (int)a); + putchar(0x80 + (char)a); } if (j == now) printf("\x1b[0m"); @@ -237,34 +232,40 @@ report_histogram(const struct lump *lp) } static void -report(const struct lump *lp, size_t sz) +report(uint64_t sz) { struct winsize wsz; + const struct lump *lp = TAILQ_FIRST(&lumps); int j; - - assert(lp != NULL); + unsigned pass = 0; + uintmax_t start = 0, length = 0; + time_t t_now = time(NULL); + + if (lp != NULL) { + pass = lp->pass; + start = lp->start; + length = lp->len; + } if (verbose) { printf("\x1b[H%s\x1b[K\n", input); - report_header(1); - } else { - putchar('\r'); + report_header("\x1b[K\n"); } - printf("%13jd %7zu %13jd %5d %13jd %13jd %9.4f", - (intmax_t)lp->start, - sz, - (intmax_t)lp->len, - lp->state, - (intmax_t)done_size, - (intmax_t)(tot_size - done_size), - 100*(double)done_size/(double)tot_size + printf("%13ju %7ju %13ju %5u %13ju %13ju %9.4f", + start, + (uintmax_t)sz, + length, + pass, + (uintmax_t)done_size, + (uintmax_t)(total_size - done_size), + 100*(double)done_size/(double)total_size ); if (verbose) { printf("\x1b[K\n"); report_hline(NULL); - report_histogram(lp); + report_histogram(start); if (TAILQ_EMPTY(&minute)) { report_hline(NULL); } else { @@ -272,27 +273,36 @@ report(const struct lump *lp, size_t sz) report_periods(); report_hline("\xe2\x94\xb4"); } + printf("Missing: %u", nlumps); + printf(" Success: %.0f/%.0f =", n_good_reads, n_reads); + printf(" %.4f%%", 100 * n_good_reads / n_reads); + printf(" Duration: %.3fs", (t_now - t_first) / n_reads); + printf("\x1b[K\n"); + report_hline(NULL); j = ioctl(STDIN_FILENO, TIOCGWINSZ, &wsz); if (!j) printf("\x1b[%d;1H", wsz.ws_row); + } else { + printf("\n"); } - fflush(stdout); } /**********************************************************************/ static void -new_lump(off_t start, off_t len, int state) +new_lump(uint64_t start, uint64_t len, unsigned pass) { struct lump *lp; + assert(len > 0); lp = malloc(sizeof *lp); if (lp == NULL) err(1, "Malloc failed"); lp->start = start; lp->len = len; - lp->state = state; + lp->pass = pass; TAILQ_INSERT_TAIL(&lumps, lp, list); + nlumps += 1; } /********************************************************************** @@ -306,98 +316,100 @@ save_worklist(void) struct lump *llp; char buf[PATH_MAX]; - if (fdw >= 0 && fdatasync(fdw)) + if (write_fd >= 0 && fdatasync(write_fd)) err(1, "Write error, probably disk full"); - if (wworklist != NULL) { - bprintf(buf, "%s.tmp", wworklist); - (void)fprintf(stderr, "\nSaving worklist ..."); - (void)fflush(stderr); + if (write_worklist_file != NULL) { + snprintf(buf, sizeof(buf), "%s.tmp", write_worklist_file); + fprintf(stderr, "\nSaving worklist ..."); file = fopen(buf, "w"); if (file == NULL) err(1, "Error opening file %s", buf); - TAILQ_FOREACH(llp, &lumps, list) - fprintf(file, "%jd %jd %d\n", - (intmax_t)llp->start, (intmax_t)llp->len, - llp->state); - (void)fflush(file); + TAILQ_FOREACH(llp, &lumps, list) { + assert (llp->len > 0); + fprintf(file, "%ju %ju %u\n", + (uintmax_t)llp->start, + (uintmax_t)llp->len, + llp->pass); + } + fflush(file); if (ferror(file) || fdatasync(fileno(file)) || fclose(file)) err(1, "Error writing file %s", buf); - if (rename(buf, wworklist)) - err(1, "Error renaming %s to %s", buf, wworklist); - (void)fprintf(stderr, " done.\n"); + if (rename(buf, write_worklist_file)) + err(1, "Error renaming %s to %s", + buf, write_worklist_file); + fprintf(stderr, " done.\n"); } } /* Read the worklist if -r was given */ -static off_t -read_worklist(off_t t) +static uint64_t +read_worklist(void) { - off_t s, l, d; - int state, lines; + uintmax_t start, length; + uint64_t missing = 0; + unsigned pass, lines; FILE *file; - (void)fprintf(stderr, "Reading worklist ..."); - (void)fflush(stderr); - file = fopen(rworklist, "r"); + fprintf(stderr, "Reading worklist ..."); + file = fopen(read_worklist_file, "r"); if (file == NULL) - err(1, "Error opening file %s", rworklist); + err(1, "Error opening file %s", read_worklist_file); lines = 0; - d = t; for (;;) { ++lines; - if (3 != fscanf(file, "%jd %jd %d\n", &s, &l, &state)) { + if (3 != fscanf(file, "%ju %ju %u\n", &start, &length, &pass)) { if (!feof(file)) - err(1, "Error parsing file %s at line %d", - rworklist, lines); + err(1, "Error parsing file %s at line %u", + read_worklist_file, lines); else break; } - new_lump(s, l, state); - d -= l; + if (length > 0) { + new_lump(start, length, pass); + missing += length; + } } if (fclose(file)) - err(1, "Error closing file %s", rworklist); - (void)fprintf(stderr, " done.\n"); + err(1, "Error closing file %s", read_worklist_file); + fprintf(stderr, " done.\n"); /* - * Return the number of bytes already read - * (at least not in worklist). + * Return the number of bytes outstanding */ - return (d); + return (missing); } /**********************************************************************/ static void -write_buf(int fd, const void *buf, ssize_t len, off_t where) +write_buf(int fd, const void *buf, uint64_t length, uint64_t where) { - ssize_t i; + int64_t i; - i = pwrite(fd, buf, len, where); - if (i == len) + i = pwrite(fd, buf, length, (off_t)where); + if (i > 0 && (uint64_t)i == length) return; - printf("\nWrite error at %jd/%zu\n\t%s\n", - where, i, strerror(errno)); + printf("\nWrite error at %ju/%ju: %jd (%s)\n", + (uintmax_t)where, + (uintmax_t)length, + (intmax_t)i, strerror(errno)); save_worklist(); if (write_errors_are_fatal) exit(3); } static void -fill_buf(char *buf, ssize_t len, const char *pattern) +fill_buf(char *buf, int64_t len, const char *pattern) { - ssize_t sz = strlen(pattern); - ssize_t i, j; + int64_t sz = strlen(pattern); + int64_t i; for (i = 0; i < len; i += sz) { - j = len - i; - if (j > sz) - j = sz; - memcpy(buf + i, pattern, j); + memcpy(buf + i, pattern, MIN(len - i, sz)); } } @@ -406,45 +418,334 @@ fill_buf(char *buf, ssize_t len, const char *pattern) static void usage(void) { - (void)fprintf(stderr, "usage: recoverdisk [-b bigsize] [-r readlist] " + fprintf(stderr, "usage: recoverdisk [-b big_read] [-r readlist] " "[-s interval] [-w writelist] source [destination]\n"); /* XXX update */ exit(1); } static void -sighandler(__unused int sig) +sighandler(int sig) { + (void)sig; aborting = 1; } +/**********************************************************************/ + +static int64_t +attempt_one_lump(time_t t_now) +{ + struct lump *lp; + uint64_t sz; + int64_t retval; + int error; + + lp = TAILQ_FIRST(&lumps); + if (lp == NULL) + return(0); + + if (lp->pass == 0) { + sz = MIN(lp->len, big_read); + } else if (lp->pass == 1) { + sz = MIN(lp->len, medium_read); + } else { + sz = MIN(lp->len, small_read); + } + + assert(sz != 0); + + n_reads += 1; + retval = pread(read_fd, work_buf, sz, lp->start); + +#if 0 /* enable this when testing */ + if (!(random() & 0xf)) { + retval = -1; + errno = EIO; + usleep(20000); + } else { + usleep(2000); + } +#endif + + error = errno; + if (retval > 0) { + n_good_reads += 1; + sz = retval; + done_size += sz; + if (write_fd >= 0) { + write_buf(write_fd, work_buf, sz, lp->start); + } + if (log_file != NULL) { + fprintf(log_file, "%jd %ju %ju\n", + (intmax_t)t_now, + (uintmax_t)lp->start, + (uintmax_t)sz + ); + fflush(log_file); + } + } else { + printf("%14ju %7ju read error %d: (%s)", + (uintmax_t)lp->start, + (uintmax_t)sz, error, strerror(error)); + if (error_pause > 1) { + printf(" (Pausing %g s)", error_pause); + } + printf("\n"); + + if (write_fd >= 0 && pattern_buf != NULL) { + write_buf(write_fd, pattern_buf, sz, lp->start); + } + new_lump(lp->start, sz, lp->pass + 1); + retval = -sz; + } + lp->start += sz; + lp->len -= sz; + if (lp->len == 0) { + TAILQ_REMOVE(&lumps, lp, list); + nlumps -= 1; + free(lp); + } + errno = error; + return (retval); +} + + +/**********************************************************************/ + +static void +determine_total_size(void) +{ + struct stat sb; + int error; + + if (total_size != 0) + return; + + error = fstat(read_fd, &sb); + if (error < 0) + err(1, "fstat failed"); + + if (S_ISBLK(sb.st_mode) || S_ISCHR(sb.st_mode)) { +#ifdef DIOCGMEDIASIZE + off_t mediasize; + error = ioctl(read_fd, DIOCGMEDIASIZE, &mediasize); + if (error == 0 && mediasize > 0) { + total_size = mediasize; + printf("# Got total_size from DIOCGMEDIASIZE: %ju\n", + (uintmax_t)total_size); + return; + } +#endif + } else if (S_ISREG(sb.st_mode) && sb.st_size > 0) { + total_size = sb.st_size; + printf("# Got total_size from stat(2): %ju\n", + (uintmax_t)total_size); + return; + } else { + errx(1, "Input must be device or regular file"); + } + fprintf(stderr, "Specify total size with -t option\n"); + exit(1); +} + +static void +determine_read_sizes(void) +{ + int error; + u_int sectorsize; + off_t stripesize; + + determine_total_size(); + +#ifdef DIOCGSECTORSIZE + if (small_read == 0) { + error = ioctl(read_fd, DIOCGSECTORSIZE, §orsize); + if (error >= 0 && sectorsize > 0) { + small_read = sectorsize; + printf("# Got small_read from DIOCGSECTORSIZE: %ju\n", + (uintmax_t)small_read + ); + } + } +#endif + + if (small_read == 0) { + printf("Assuming 512 for small_read\n"); + small_read = 512; + } + + if (medium_read && (medium_read % small_read)) { + errx(1, + "medium_read (%ju) is not a multiple of small_read (%ju)\n", + (uintmax_t)medium_read, (uintmax_t)small_read + ); + } + + if (big_read != 0 && (big_read % small_read)) { + errx(1, + "big_read (%ju) is not a multiple of small_read (%ju)\n", + (uintmax_t)big_read, (uintmax_t)small_read + ); + } + +#ifdef DIOCGSTRIPESIZE + if (medium_read == 0) { + error = ioctl(read_fd, DIOCGSTRIPESIZE, &stripesize); + if (error < 0 || stripesize < 0) { + // nope + } else if ((uint64_t)stripesize < small_read) { + // nope + } else if (stripesize % small_read) { + // nope + } else if (0 < stripesize && stripesize < (128<<10)) { + medium_read = stripesize; + printf("# Got medium_read from DIOCGSTRIPESIZE: %ju\n", + (uintmax_t)medium_read + ); + } + } +#endif +#if defined(DIOCGFWSECTORS) && defined(DIOCGFWHEADS) + if (medium_read == 0) { + u_int fwsectors = 0, fwheads = 0; + error = ioctl(read_fd, DIOCGFWSECTORS, &fwsectors); + if (error) + fwsectors = 0; + error = ioctl(read_fd, DIOCGFWHEADS, &fwheads); + if (error) + fwheads = 0; + if (fwsectors && fwheads) { + medium_read = fwsectors * fwheads * small_read; + printf( + "# Got medium_read from DIOCGFW{SECTORS,HEADS}: %ju\n", + (uintmax_t)medium_read + ); + } + } +#endif + + if (big_read == 0 && medium_read != 0) { + if (medium_read > (64<<10)) { + big_read = medium_read; + } else { + big_read = 128 << 10; + big_read -= big_read % medium_read; + } + printf("# Got big_read from medium_read: %ju\n", + (uintmax_t)big_read + ); + } + + if (big_read == 0) { + big_read = 128 << 10; + printf("# Defaulting big_read to %ju\n", + (uintmax_t)big_read + ); + } + + if (medium_read == 0) { + /* + * We do not want to go directly to single sectors, but + * we also dont want to waste time doing multi-sector + * reads with high failure probability. + */ + uint64_t h = big_read; + uint64_t l = small_read; + while (h > l) { + h >>= 2; + l <<= 1; + } + medium_read = h; + printf("# Got medium_read from small_read & big_read: %ju\n", + (uintmax_t)medium_read + ); + } + fprintf(stderr, + "# Bigsize = %ju, medium_read = %ju, small_read = %ju\n", + (uintmax_t)big_read, (uintmax_t)medium_read, (uintmax_t)small_read); + +} + + +/**********************************************************************/ + +static void +monitor_read_sizes(uint64_t failed_size) +{ + + if (failed_size == big_read && medium_read != small_read) { + if (n_reads < n_good_reads + 3) + return; + fprintf( + stderr, + "Too many failures for big reads." + " (%.0f bad of %.0f)" + " Shifting to medium_reads.\n", + n_reads - n_good_reads, n_reads + ); + big_read = medium_read; + medium_read = small_read; + return; + } + + if (failed_size > small_read) { + if (n_reads < n_good_reads + 100) + return; + fprintf( + stderr, + "Too many failures." + " (%.0f bad of %.0f)" + " Shifting to small_reads.\n", + n_reads - n_good_reads, n_reads + ); + big_read = small_read; + medium_read = small_read; + return; + } +} + +/**********************************************************************/ + int main(int argc, char * const argv[]) { int ch; - size_t sz, j; + int64_t sz; int error; - char *buf; - u_int sectorsize; - off_t stripesize; - time_t t1, t2; - struct stat sb; - u_int n, snapshot = 60; - static struct lump *lp; + time_t t_now, t_report, t_save; + time_t snapshot = 60, unsaved; + setbuf(stdout, NULL); + setbuf(stderr, NULL); - while ((ch = getopt(argc, argv, "b:r:w:s:u:v")) != -1) { + while ((ch = getopt(argc, argv, "b:l:p:m:r:w:s:t:u:v")) != -1) { switch (ch) { case 'b': - bigsize = strtoul(optarg, NULL, 0); + big_read = strtoul(optarg, NULL, 0); + break; + case 'l': + log_file = fopen(optarg, "a"); + if (log_file == NULL) { + err(1, "Could not open logfile for append"); + } + break; + case 'p': + error_pause = strtod(optarg, NULL); + break; + case 'm': + medium_read = strtoul(optarg, NULL, 0); break; case 'r': - rworklist = strdup(optarg); - if (rworklist == NULL) + read_worklist_file = strdup(optarg); + if (read_worklist_file == NULL) err(1, "Cannot allocate enough memory"); break; case 's': - snapshot = strtoul(optarg, NULL, 0); + small_read = strtoul(optarg, NULL, 0); + break; + case 't': + total_size = strtoul(optarg, NULL, 0); break; case 'u': unreadable_pattern = optarg; @@ -453,8 +754,8 @@ main(int argc, char * const argv[]) set_verbose(); break; case 'w': - wworklist = strdup(optarg); - if (wworklist == NULL) + write_worklist_file = strdup(optarg); + if (write_worklist_file == NULL) err(1, "Cannot allocate enough memory"); break; default: @@ -469,149 +770,106 @@ main(int argc, char * const argv[]) usage(); input = argv[0]; - fdr = open(argv[0], O_RDONLY); - if (fdr < 0) + read_fd = open(argv[0], O_RDONLY); + if (read_fd < 0) err(1, "Cannot open read descriptor %s", argv[0]); - error = fstat(fdr, &sb); - if (error < 0) - err(1, "fstat failed"); - if (S_ISBLK(sb.st_mode) || S_ISCHR(sb.st_mode)) { - error = ioctl(fdr, DIOCGSECTORSIZE, §orsize); - if (error < 0) - err(1, "DIOCGSECTORSIZE failed"); - - error = ioctl(fdr, DIOCGSTRIPESIZE, &stripesize); - if (error == 0 && stripesize < sectorsize) - sectorsize = stripesize; + determine_read_sizes(); - minsize = sectorsize; - bigsize = rounddown(bigsize, sectorsize); + work_buf = malloc(big_read); + assert (work_buf != NULL); - error = ioctl(fdr, DIOCGMEDIASIZE, &tot_size); - if (error < 0) - err(1, "DIOCGMEDIASIZE failed"); + if (argc > 1) { + write_fd = open(argv[1], O_WRONLY | O_CREAT, DEFFILEMODE); + if (write_fd < 0) + err(1, "Cannot open write descriptor %s", argv[1]); + if (ftruncate(write_fd, (off_t)total_size) < 0) + err(1, "Cannot truncate output %s to %ju bytes", + argv[1], (uintmax_t)total_size); } else { - tot_size = sb.st_size; + write_fd = -1; } - if (bigsize < minsize) - bigsize = minsize; - - for (ch = 0; (bigsize >> ch) > minsize; ch++) - continue; - medsize = bigsize >> (ch / 2); - medsize = rounddown(medsize, minsize); - - fprintf(stderr, "Bigsize = %zu, medsize = %zu, minsize = %zu\n", - bigsize, medsize, minsize); - - buf = malloc(bigsize); - if (buf == NULL) - err(1, "Cannot allocate %zu bytes buffer", bigsize); + if (strlen(unreadable_pattern)) { + pattern_buf = malloc(big_read); + assert(pattern_buf != NULL); + fill_buf(pattern_buf, big_read, unreadable_pattern); + } - if (argc > 1) { - fdw = open(argv[1], O_WRONLY | O_CREAT, DEFFILEMODE); - if (fdw < 0) - err(1, "Cannot open write descriptor %s", argv[1]); - if (ftruncate(fdw, tot_size) < 0) - err(1, "Cannot truncate output %s to %jd bytes", - argv[1], (intmax_t)tot_size); - } else - fdw = -1; - - if (rworklist != NULL) { - done_size = read_worklist(tot_size); + if (read_worklist_file != NULL) { + done_size = total_size - read_worklist(); } else { - new_lump(0, tot_size, 0); + new_lump(0UL, total_size, 0UL); done_size = 0; } - if (wworklist != NULL) + if (write_worklist_file != NULL) signal(SIGINT, sighandler); - t1 = time(NULL); sz = 0; if (!verbose) - report_header(0); + report_header("\n"); else printf("\x1b[2J"); - n = 0; - for (;;) { - lp = TAILQ_FIRST(&lumps); - if (lp == NULL) - break; - while (lp->len > 0) { - if (lp->state == 0) - sz = MIN(lp->len, (off_t)bigsize); - else if (lp->state == 1) - sz = MIN(lp->len, (off_t)medsize); - else - sz = MIN(lp->len, (off_t)minsize); - assert(sz != 0); - - t2 = time(NULL); - if (t1 != t2 || lp->len < (off_t)bigsize) { - t1 = t2; - if (++n == snapshot) { - save_worklist(); - n = 0; - } - report(lp, sz); - } + t_first = time(NULL); + t_report = t_first; + t_save = t_first; + unsaved = 0; + while (!aborting) { + t_now = time(NULL); + sz = attempt_one_lump(t_now); + error = errno; - j = pread(fdr, buf, sz, lp->start); -#if 0 -if (!(random() & 0xf)) { - j = -1; - errno = EIO; -} -#endif - if (j == sz) { - done_size += sz; - if (fdw >= 0) - write_buf(fdw, buf, sz, lp->start); - lp->start += sz; - lp->len -= sz; - if (verbose && lp->state > 2) - report_good_read(t2, sz); - continue; - } - error = errno; - - printf("%jd %zu %d read error (%s)\n", - lp->start, sz, lp->state, strerror(error)); - if (verbose) - report(lp, sz); - if (fdw >= 0 && strlen(unreadable_pattern)) { - fill_buf(buf, sz, unreadable_pattern); - write_buf(fdw, buf, sz, lp->start); + if (sz == 0) { + break; + } + + if (sz > 0) { + unsaved += 1; + } + if (unsaved && (t_save + snapshot) < t_now) { + save_worklist(); + unsaved = 0; + t_save = t_now; + if (!verbose) { + report_header("\n"); + t_report = t_now; } - new_lump(lp->start, sz, lp->state + 1); - lp->start += sz; - lp->len -= sz; - if (error == EINVAL) { - printf("Try with -b 131072 or lower ?\n"); - aborting = 1; - break; + } + if (sz > 0) { + if (verbose) { + account_good_read(t_now, sz); } - if (error == ENXIO) { - printf("Input device probably detached...\n"); - aborting = 1; - break; + if (t_report != t_now) { + report(sz); + t_report = t_now; } + continue; } - if (aborting) - save_worklist(); - if (aborting || !TAILQ_NEXT(lp, list)) - report(lp, sz); - if (aborting) + + monitor_read_sizes(-sz); + + if (error == EINVAL) { + printf("Try with -b 131072 or lower ?\n"); + aborting = 1; break; - assert(lp->len == 0); - TAILQ_REMOVE(&lumps, lp, list); - free(lp); + } + if (error == ENXIO) { + printf("Input device probably detached...\n"); + aborting = 1; + break; + } + report(-sz); + t_report = t_now; + if (error_pause > 0) { + usleep((unsigned long)(1e6 * error_pause)); + } } + save_worklist(); + free(work_buf); + if (pattern_buf != NULL) + free(pattern_buf); printf("%s", aborting ? "Aborted\n" : "Completed\n"); - free(buf); - return (0); + report(0UL); + return (0); // XXX } diff --git a/sbin/route/route_netlink.c b/sbin/route/route_netlink.c index 631c2860b547..ba22a2ec1e22 100644 --- a/sbin/route/route_netlink.c +++ b/sbin/route/route_netlink.c @@ -738,6 +738,7 @@ print_nlmsg(struct nl_helper *h, struct nlmsghdr *hdr, struct snl_msg_info *cinf print_nlmsg_generic(h, hdr, cinfo); } + fflush(stdout); snl_clear_lb(&h->ss_cmd); } diff --git a/sbin/savecore/savecore.8 b/sbin/savecore/savecore.8 index 53d2360719dd..1fb79c51f98d 100644 --- a/sbin/savecore/savecore.8 +++ b/sbin/savecore/savecore.8 @@ -25,7 +25,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd April 4, 2022 +.Dd July 16, 2025 .Dt SAVECORE 8 .Os .Sh NAME @@ -69,7 +69,7 @@ Generate output via .Xr libxo 3 in a selection of different human and machine readable formats. See -.Xr xo_parse_args 3 +.Xr xo_options 7 for details on command line arguments. .It Fl C Check to see if a dump exists, @@ -193,7 +193,7 @@ is meant to be called near the end of the initialization file .Xr zstd 1 , .Xr getbootfile 3 , .Xr libxo 3 , -.Xr xo_parse_args 3 , +.Xr xo_options 7 , .Xr mem 4 , .Xr textdump 4 , .Xr tar 5 , diff --git a/sbin/swapon/tests/swapon_test.sh b/sbin/swapon/tests/swapon_test.sh index b6d31ecaeed0..a04bb36cc49e 100755 --- a/sbin/swapon/tests/swapon_test.sh +++ b/sbin/swapon/tests/swapon_test.sh @@ -31,7 +31,10 @@ attach_mdX_head() attach_mdX_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo 'md31 none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md31 as swap device" -x "swapon -F fstab.out -a" } @@ -49,7 +52,10 @@ attach_dev_mdX_head() attach_dev_mdX_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo '/dev/md32 none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md32 as swap device" -x "swapon -F fstab.out -a" } @@ -67,7 +73,10 @@ attach_md_head() attach_md_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo 'md none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md[0-9][0-9]* as swap device" -x "swapon -F fstab.out -a" } @@ -85,7 +94,10 @@ attach_dev_md_head() attach_dev_md_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo '/dev/md none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md[0-9][0-9]* as swap device" -x "swapon -F fstab.out -a" } @@ -103,7 +115,10 @@ attach_mdX_eli_head() attach_mdX_eli_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo 'md33.eli none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md33.eli as swap device" -x "swapon -F fstab.out -a" } @@ -121,7 +136,10 @@ attach_dev_mdX_eli_head() attach_dev_mdX_eli_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo '/dev/md34.eli none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md34.eli as swap device" -x "swapon -F fstab.out -a" } @@ -139,7 +157,10 @@ attach_md_eli_head() attach_md_eli_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo 'md.eli none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md[0-9][0-9]*.eli as swap device" -x "swapon -F fstab.out -a" } @@ -157,7 +178,10 @@ attach_dev_md_eli_head() attach_dev_md_eli_body() { # if the swapfile is too small (like 1k) then mdconfig hangs looking up the md - atf_check -s exit:0 -x "truncate -s 10k swapfile" + # but need a swapfile bigger than one page kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize * 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" atf_check -s exit:0 -o save:fstab.out -x "echo '/dev/md.eli none swap sw,file=swapfile 0 0'" atf_check -s exit:0 -o match:"swapon: adding /dev/md[0-9][0-9]*.eli as swap device" -x "swapon -F fstab.out -a" } @@ -167,6 +191,24 @@ attach_dev_md_eli_cleanup() } ### + +atf_test_case attach_too_small +attach_too_small_head() +{ + atf_set "descr" "should refuse to attach if smaller than one kernel page size" +} +attach_too_small_body() +{ + # Need to use smaller than kernel page size + pagesize=$(sysctl -n hw.pagesize) + minsize=$(( pagesize / 2 )) + atf_check -s exit:0 -x "truncate -s $minsize swapfile" + atf_check -s exit:0 -o save:fstab.out -x "echo 'md35 none swap sw,file=swapfile 0 0'" + atf_check -s exit:1 -e match:"swapon: /dev/md35: NSWAPDEV limit reached" -x "swapon -F fstab.out -a" + atf_check -s exit:0 -x "mdconfig -d -u 35" +} + +### atf_init_test_cases() { atf_add_test_case attach_mdX @@ -178,4 +220,6 @@ atf_init_test_cases() atf_add_test_case attach_dev_mdX_eli atf_add_test_case attach_md_eli atf_add_test_case attach_dev_md_eli + + atf_add_test_case attach_too_small } diff --git a/sbin/zfsbootcfg/zfsbootcfg.8 b/sbin/zfsbootcfg/zfsbootcfg.8 index 5e7f02b2578c..3831adfc81bd 100644 --- a/sbin/zfsbootcfg/zfsbootcfg.8 +++ b/sbin/zfsbootcfg/zfsbootcfg.8 @@ -22,7 +22,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd July 22, 2020 +.Dd July 28, 2025 .Dt ZFSBOOTCFG 8 .Os .Sh NAME @@ -44,14 +44,11 @@ is used to set .Xr boot.config 5 Ns -style options to be used by -.Xr zfsboot 8 , .Xr gptzfsboot 8 or .Xr loader 8 the next time the machine is booted. Once -.Xr zfsboot 8 -or .Xr gptzfsboot 8 or .Xr loader 8 @@ -130,8 +127,7 @@ To clear the boot options: .Xr boot.config 5 , .Xr bectl 8 , .Xr gptzfsboot 8 , -.Xr loader 8 , -.Xr zfsboot 8 +.Xr loader 8 .Sh HISTORY .Nm appeared in |