diff options
| author | Conrad Meyer <cem@FreeBSD.org> | 2019-08-08 15:37:56 +0000 |
|---|---|---|
| committer | Conrad Meyer <cem@FreeBSD.org> | 2019-08-08 15:37:56 +0000 |
| commit | 90f4bdbe917eaf678feca2b0ff9647b5ae8bbb9d (patch) | |
| tree | c579df59365bfa4e51bc36e8259711c1f01670de /programs | |
| parent | fa94c7381af469a06d8f9077162c2cc5dee581cb (diff) | |
Notes
Diffstat (limited to 'programs')
| -rw-r--r-- | programs/README.md | 4 | ||||
| -rw-r--r-- | programs/util.c | 1 | ||||
| -rw-r--r-- | programs/zstd.1 | 14 | ||||
| -rw-r--r-- | programs/zstd.1.md | 11 | ||||
| -rw-r--r-- | programs/zstdcli.c | 3 | ||||
| -rw-r--r-- | programs/zstdgrep.1 | 2 | ||||
| -rw-r--r-- | programs/zstdless.1 | 2 |
7 files changed, 27 insertions, 10 deletions
diff --git a/programs/README.md b/programs/README.md index d9ef5dd2a62e..c3a5590d6de5 100644 --- a/programs/README.md +++ b/programs/README.md @@ -157,8 +157,8 @@ Advanced arguments : Dictionary builder : --train ## : create a dictionary from a training set of files ---train-cover[=k=#,d=#,steps=#,split=#] : use the cover algorithm with optional args ---train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#] : use the fastcover algorithm with optional args +--train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args +--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,shrink[=#],accel=#] : use the fastcover algorithm with optional args --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9) -o file : `file` is dictionary name (default: dictionary) --maxdict=# : limit dictionary to specified size (default: 112640) diff --git a/programs/util.c b/programs/util.c index 6190bca50270..fb77d1783928 100644 --- a/programs/util.c +++ b/programs/util.c @@ -245,6 +245,7 @@ int UTIL_prepareFileList(const char *dirName, char** bufStart, size_t* pos, char if (!followLinks && UTIL_isLink(path)) { UTIL_DISPLAYLEVEL(2, "Warning : %s is a symbolic link, ignoring\n", path); + free(path); continue; } diff --git a/programs/zstd.1 b/programs/zstd.1 index beca9da421d1..b25d353763e0 100644 --- a/programs/zstd.1 +++ b/programs/zstd.1 @@ -1,5 +1,5 @@ . -.TH "ZSTD" "1" "July 2019" "zstd 1.4.1" "User Commands" +.TH "ZSTD" "1" "July 2019" "zstd 1.4.2" "User Commands" . .SH "NAME" \fBzstd\fR \- zstd, zstdmt, unzstd, zstdcat \- Compress or decompress \.zst files @@ -229,11 +229,11 @@ Split input files in blocks of size # (default: no split) A dictionary ID is a locally unique ID that a decoder can use to verify it is using the right dictionary\. By default, zstd will create a 4\-bytes random number ID\. It\'s possible to give a precise number instead\. Short numbers have an advantage : an ID < 256 will only need 1 byte in the compressed frame header, and an ID < 65536 will only need 2 bytes\. This compares favorably to 4 bytes default\. However, it\'s up to the dictionary manager to not assign twice the same ID to 2 different dictionaries\. . .TP -\fB\-\-train\-cover[=k#,d=#,steps=#,split=#]\fR -Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or split <= 0, then the default value of 100 is used\. Requires that \fId\fR <= \fIk\fR\. +\fB\-\-train\-cover[=k#,d=#,steps=#,split=#,shrink[=#]]\fR +Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or split <= 0, then the default value of 100 is used\. Requires that \fId\fR <= \fIk\fR\. If \fIshrink\fR flag is not used, then the default value for \fIshrinkDict\fR of 0 is used\. If \fIshrink\fR is not specified, then the default value for \fIshrinkDictMaxRegression\fR of 1 is used\. . .IP -Selects segments of size \fIk\fR with highest score to put in the dictionary\. The score of a segment is computed by the sum of the frequencies of all the subsegments of size \fId\fR\. Generally \fId\fR should be in the range [6, 8], occasionally up to 16, but the algorithm will run faster with d <= \fI8\fR\. Good values for \fIk\fR vary widely based on the input data, but a safe range is [2 * \fId\fR, 2000]\. If \fIsplit\fR is 100, all input samples are used for both training and testing to find optimal \fId\fR and \fIk\fR to build dictionary\. Supports multithreading if \fBzstd\fR is compiled with threading support\. +Selects segments of size \fIk\fR with highest score to put in the dictionary\. The score of a segment is computed by the sum of the frequencies of all the subsegments of size \fId\fR\. Generally \fId\fR should be in the range [6, 8], occasionally up to 16, but the algorithm will run faster with d <= \fI8\fR\. Good values for \fIk\fR vary widely based on the input data, but a safe range is [2 * \fId\fR, 2000]\. If \fIsplit\fR is 100, all input samples are used for both training and testing to find optimal \fId\fR and \fIk\fR to build dictionary\. Supports multithreading if \fBzstd\fR is compiled with threading support\. Having \fIshrink\fR enabled takes a truncated dictionary of minimum size and doubles in size until compression ratio of the truncated dictionary is at most \fIshrinkDictMaxRegression%\fR worse than the compression ratio of the largest dictionary\. . .IP Examples: @@ -253,6 +253,12 @@ Examples: .IP \fBzstd \-\-train\-cover=k=50,split=60 FILEs\fR . +.IP +\fBzstd \-\-train\-cover=shrink FILEs\fR +. +.IP +\fBzstd \-\-train\-cover=shrink=2 FILEs\fR +. .TP \fB\-\-train\-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]\fR Same as cover but with extra parameters \fIf\fR and \fIaccel\fR and different default value of split If \fIsplit\fR is not specified, then it tries \fIsplit\fR = 75\. If \fIf\fR is not specified, then it tries \fIf\fR = 20\. Requires that 0 < \fIf\fR < 32\. If \fIaccel\fR is not specified, then it tries \fIaccel\fR = 1\. Requires that 0 < \fIaccel\fR <= 10\. Requires that \fId\fR = 6 or \fId\fR = 8\. diff --git a/programs/zstd.1.md b/programs/zstd.1.md index 93c6fa40010e..3ab2667a0483 100644 --- a/programs/zstd.1.md +++ b/programs/zstd.1.md @@ -244,13 +244,15 @@ Compression of small files similar to the sample set will be greatly improved. This compares favorably to 4 bytes default. However, it's up to the dictionary manager to not assign twice the same ID to 2 different dictionaries. -* `--train-cover[=k#,d=#,steps=#,split=#]`: +* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: Select parameters for the default dictionary builder algorithm named cover. If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. If _k_ is not specified, then it tries _steps_ values in the range [50, 2000]. If _steps_ is not specified, then the default value of 40 is used. If _split_ is not specified or split <= 0, then the default value of 100 is used. Requires that _d_ <= _k_. + If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used. + If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used. Selects segments of size _k_ with highest score to put in the dictionary. The score of a segment is computed by the sum of the frequencies of all the @@ -262,6 +264,9 @@ Compression of small files similar to the sample set will be greatly improved. If _split_ is 100, all input samples are used for both training and testing to find optimal _d_ and _k_ to build dictionary. Supports multithreading if `zstd` is compiled with threading support. + Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles + in size until compression ratio of the truncated dictionary is at most + _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary. Examples: @@ -275,6 +280,10 @@ Compression of small files similar to the sample set will be greatly improved. `zstd --train-cover=k=50,split=60 FILEs` + `zstd --train-cover=shrink FILEs` + + `zstd --train-cover=shrink=2 FILEs` + * `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`: Same as cover but with extra parameters _f_ and _accel_ and different default value of split If _split_ is not specified, then it tries _split_ = 75. diff --git a/programs/zstdcli.c b/programs/zstdcli.c index a13c924c5ce0..de286cdf283e 100644 --- a/programs/zstdcli.c +++ b/programs/zstdcli.c @@ -294,13 +294,14 @@ static unsigned longCommandWArg(const char** stringPtr, const char* longCommand) #ifndef ZSTD_NODICT + +static const unsigned kDefaultRegression = 1; /** * parseCoverParameters() : * reads cover parameters from *stringPtr (e.g. "--train-cover=k=48,d=8,steps=32") into *params * @return 1 means that cover parameters were correct * @return 0 in case of malformed parameters */ -static const unsigned kDefaultRegression = 1; static unsigned parseCoverParameters(const char* stringPtr, ZDICT_cover_params_t* params) { memset(params, 0, sizeof(*params)); diff --git a/programs/zstdgrep.1 b/programs/zstdgrep.1 index d0a0292215a5..0bc6ed7f5838 100644 --- a/programs/zstdgrep.1 +++ b/programs/zstdgrep.1 @@ -1,5 +1,5 @@ . -.TH "ZSTDGREP" "1" "July 2019" "zstd 1.4.1" "User Commands" +.TH "ZSTDGREP" "1" "July 2019" "zstd 1.4.2" "User Commands" . .SH "NAME" \fBzstdgrep\fR \- print lines matching a pattern in zstandard\-compressed files diff --git a/programs/zstdless.1 b/programs/zstdless.1 index 4e21d5a90637..73e9504406a4 100644 --- a/programs/zstdless.1 +++ b/programs/zstdless.1 @@ -1,5 +1,5 @@ . -.TH "ZSTDLESS" "1" "July 2019" "zstd 1.4.1" "User Commands" +.TH "ZSTDLESS" "1" "July 2019" "zstd 1.4.2" "User Commands" . .SH "NAME" \fBzstdless\fR \- view zstandard\-compressed files |
