1 files changed, 673 insertions, 0 deletions
diff --git a/manuals/benchmarks.md b/manuals/benchmarks.md
new file mode 100644
index 000000000000..af0593f4e876
--- /dev/null
+++ b/manuals/benchmarks.md
@@ -0,0 +1,673 @@
+# Benchmarks
+
+The results of these benchmarks suggest that building this `bc` with
+optimization at `-O3` with link-time optimization (`-flto`) will result in the
+best performance. However, using `-march=native` can result in **WORSE**
+performance.
+
+*Note*: all benchmarks were run four times, and the fastest run is the one
+shown. Also, `[bc]` means whichever `bc` was being run, and the assumed working
+directory is the root directory of this repository. Also, this `bc` was at
+version `3.0.0` while GNU `bc` was at version `1.07.1`, and all tests were
+conducted on an `x86_64` machine running Gentoo Linux with `clang` `9.0.1` as
+the compiler.
+
+## Typical Optimization Level
+
+These benchmarks were run with both `bc`'s compiled with the typical `-O2`
+optimizations and no link-time optimization.
+
+### Addition
+
+The command used was:
+
+```
+tests/script.sh bc add.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.54
+user 1.21
+sys 1.32
+```
+
+For this `bc`:
+
+```
+real 0.88
+user 0.85
+sys 0.02
+```
+
+### Subtraction
+
+The command used was:
+
+```
+tests/script.sh bc subtract.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.51
+user 1.05
+sys 1.45
+```
+
+For this `bc`:
+
+```
+real 0.91
+user 0.85
+sys 0.05
+```
+
+### Multiplication
+
+The command used was:
+
+```
+tests/script.sh bc multiply.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 7.15
+user 4.69
+sys 2.46
+```
+
+For this `bc`:
+
+```
+real 2.20
+user 2.10
+sys 0.09
+```
+
+### Division
+
+The command used was:
+
+```
+tests/script.sh bc divide.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 3.36
+user 1.87
+sys 1.48
+```
+
+For this `bc`:
+
+```
+real 1.61
+user 1.57
+sys 0.03
+```
+
+### Power
+
+The command used was:
+
+```
+printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 11.30
+user 11.30
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 0.73
+user 0.72
+sys 0.00
+```
+
+### Scripts
+
+[This file][1] was downloaded, saved at `../timeconst.bc` and the following
+patch was applied:
+
+```
+--- ../timeconst.bc	2018-09-28 11:32:22.808669000 -0600
++++ ../timeconst.bc	2019-06-07 07:26:36.359913078 -0600
+@@ -110,8 +110,10 @@
+ 
+ 		print "#endif /* KERNEL_TIMECONST_H */\n"
+ 	}
+-	halt
+ }
+ 
+-hz = read();
+-timeconst(hz)
++for (i = 0; i <= 50000; ++i) {
++	timeconst(i)
++}
++
++halt
+```
+
+The command used was:
+
+```
+time -p [bc] ../timeconst.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 16.71
+user 16.06
+sys 0.65
+```
+
+For this `bc`:
+
+```
+real 13.16
+user 13.15
+sys 0.00
+```
+
+Because this `bc` is faster when doing math, it might be a better comparison to
+run a script that is not running any math. As such, I put the following into
+`../test.bc`:
+
+```
+for (i = 0; i < 100000000; ++i) {
+	y = i
+}
+
+i
+y
+
+halt
+```
+
+The command used was:
+
+```
+time -p [bc] ../test.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 16.60
+user 16.59
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 22.76
+user 22.75
+sys 0.00
+```
+
+I also put the following into `../test2.bc`:
+
+```
+i = 0
+
+while (i < 100000000) {
+	i += 1
+}
+
+i
+
+halt
+```
+
+The command used was:
+
+```
+time -p [bc] ../test2.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 17.32
+user 17.30
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 16.98
+user 16.96
+sys 0.01
+```
+
+It seems that the improvements to the interpreter helped a lot in certain cases.
+
+Also, I have no idea why GNU `bc` did worse when it is technically doing less
+work.
+
+## Recommended Optimizations from `2.7.0`
+
+Note that, when running the benchmarks, the optimizations used are not the ones
+I recommended for version `2.7.0`, which are `-O3 -flto -march=native`.
+
+This `bc` separates its code into modules that, when optimized at link time,
+removes a lot of the inefficiency that comes from function overhead. This is
+most keenly felt with one function: `bc_vec_item()`, which should turn into just
+one instruction (on `x86_64`) when optimized at link time and inlined. There are
+other functions that matter as well.
+
+I also recommended `-march=native` on the grounds that newer instructions would
+increase performance on math-heavy code. We will see if that assumption was
+correct. (Spoiler: **NO**.)
+
+When compiling both `bc`'s with the optimizations I recommended for this `bc`
+for version `2.7.0`, the results are as follows.
+
+### Addition
+
+The command used was:
+
+```
+tests/script.sh bc add.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.44
+user 1.11
+sys 1.32
+```
+
+For this `bc`:
+
+```
+real 0.59
+user 0.54
+sys 0.05
+```
+
+### Subtraction
+
+The command used was:
+
+```
+tests/script.sh bc subtract.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.42
+user 1.02
+sys 1.40
+```
+
+For this `bc`:
+
+```
+real 0.64
+user 0.57
+sys 0.06
+```
+
+### Multiplication
+
+The command used was:
+
+```
+tests/script.sh bc multiply.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 7.01
+user 4.50
+sys 2.50
+```
+
+For this `bc`:
+
+```
+real 1.59
+user 1.53
+sys 0.05
+```
+
+### Division
+
+The command used was:
+
+```
+tests/script.sh bc divide.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 3.26
+user 1.82
+sys 1.44
+```
+
+For this `bc`:
+
+```
+real 1.24
+user 1.20
+sys 0.03
+```
+
+### Power
+
+The command used was:
+
+```
+printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 11.08
+user 11.07
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 0.71
+user 0.70
+sys 0.00
+```
+
+### Scripts
+
+The command for the `../timeconst.bc` script was:
+
+```
+time -p [bc] ../timeconst.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 15.62
+user 15.08
+sys 0.53
+```
+
+For this `bc`:
+
+```
+real 10.09
+user 10.08
+sys 0.01
+```
+
+The command for the next script, the `for` loop script, was:
+
+```
+time -p [bc] ../test.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 14.76
+user 14.75
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 17.95
+user 17.94
+sys 0.00
+```
+
+The command for the next script, the `while` loop script, was:
+
+```
+time -p [bc] ../test2.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 14.84
+user 14.83
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 13.53
+user 13.52
+sys 0.00
+```
+
+## Link-Time Optimization Only
+
+Just for kicks, let's see if `-march=native` is even useful.
+
+The optimizations I used for both `bc`'s were `-O3 -flto`.
+
+### Addition
+
+The command used was:
+
+```
+tests/script.sh bc add.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.41
+user 1.05
+sys 1.35
+```
+
+For this `bc`:
+
+```
+real 0.58
+user 0.52
+sys 0.05
+```
+
+### Subtraction
+
+The command used was:
+
+```
+tests/script.sh bc subtract.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 2.39
+user 1.10
+sys 1.28
+```
+
+For this `bc`:
+
+```
+real 0.65
+user 0.57
+sys 0.07
+```
+
+### Multiplication
+
+The command used was:
+
+```
+tests/script.sh bc multiply.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 6.82
+user 4.30
+sys 2.51
+```
+
+For this `bc`:
+
+```
+real 1.57
+user 1.49
+sys 0.08
+```
+
+### Division
+
+The command used was:
+
+```
+tests/script.sh bc divide.bc 1 0 1 1 [bc]
+```
+
+For GNU `bc`:
+
+```
+real 3.25
+user 1.81
+sys 1.43
+```
+
+For this `bc`:
+
+```
+real 1.27
+user 1.23
+sys 0.04
+```
+
+### Power
+
+The command used was:
+
+```
+printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 10.50
+user 10.49
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 0.72
+user 0.71
+sys 0.00
+```
+
+### Scripts
+
+The command for the `../timeconst.bc` script was:
+
+```
+time -p [bc] ../timeconst.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 15.50
+user 14.81
+sys 0.68
+```
+
+For this `bc`:
+
+```
+real 10.17
+user 10.15
+sys 0.01
+```
+
+The command for the next script, the `for` loop script, was:
+
+```
+time -p [bc] ../test.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 14.99
+user 14.99
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 16.85
+user 16.84
+sys 0.00
+```
+
+The command for the next script, the `while` loop script, was:
+
+```
+time -p [bc] ../test2.bc > /dev/null
+```
+
+For GNU `bc`:
+
+```
+real 14.92
+user 14.91
+sys 0.00
+```
+
+For this `bc`:
+
+```
+real 12.75
+user 12.75
+sys 0.00
+```
+
+It turns out that `-march=native` can be a problem. As such, I have removed the
+recommendation to build with `-march=native`.
+
+## Recommended Compiler
+
+When I ran these benchmarks with my `bc` compiled under `clang` vs. `gcc`, it
+performed much better under `clang`. I recommend compiling this `bc` with
+`clang`.
+
+[1]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc