Combinatorial fast implementations for GMP.

Factorial implemented at mpz level

Implements three algorithms: two almost-naive for small sizes (less than a few hundred), and Peter Luschny's Swing-Divide&Conquer[1] for larger operands. On my laptop the new code is ~20% faster than current GMP-5.0[2] implementation for all sizes, see the graphs below.

factorial Cycles for n! computation (1 000<n<300 000, logscale)
Timings graph Cycles for n! computation (n<400, logscale)

Behaviour can be deeply different on different architectures. If you try the code and obtain much different results, please let me know: <>.

The code contains an implementation of the Eratosthenes' sieve, the (odd) swing factorial, and some other service functions (e.g. a function to compute the product of an unordered list of numbers).

To test the code you can directly replace the file mpz/fac_ui.c with the new one, and recompile the GMP library. Otherwise you can use it stand-alone, compiling it with the -DMAIN option from within the GMP source directory adding all required options (after make&&(cd tune;make speed)). I use:
gcc -O2 -DMAIN -I. -Itune fac_ui.c .libs/libgmp.a tune/.libs/libspeed.a -lrt -o fac-main .

The syntax is: fac-main [limit] [option [step]] , where

The code contains two thresholds: FAC_DSC_THRESHOLD and FAC_ODD_THRESHOLD, the first is determined by the size where bc_oddfac_1 start being slower than dsc_oddfac_1; the second one by the comparing bc_fac with any of the two oddfac implementations.
TODO: tune both thresholds with tune/speed.

Licence: GNU GPLv3+.

Download: gmp-5.0.1/mpz/fac_ui.c.

References:
[1]Divide, Swing, and Conquer the Factorial and the lcm{1, 2, …, N} by Peter Luschny, preprint (April 2008)
[2]GMP - The GNU Multi Precision library

Newton's Binomial implemented at mpz level

Implements four algorithms: a small update to current GMP-5.0[2] implementation and another naïve algorithm, two Divide&Conquer experiments (to be generalised) for medium-sizes, and the asymptotically fast algorithm by P. Goetgheluck[1] using factorisation.

This code heavily depend on the code above... Precisely, it is the same file. It can be tested by replacing mpz/fac_ui.c with the new file, replace mpz/bin_uiui.c with the two lines below, and recompile the GMP.

#define OPERATION_bin_uiui
#include "fac_ui.c"

WARNING! This code is not ready for inclusion in the library yet! There are many couples (n,k) where it is slower than current implementation, because NO threshold mechanism implement yet. Only the algoritm marked new is used, but this is good, as you can see from the pictures below, only for big values of k.

The graphs below show the regions (depending on values (n,k)) where each algorithm is the fastest.

binomial, best algorithm Fastest algorithm for binomial(10x,10y) with 10x < 8040
Timings graph Fastest algorithm for binomial(x,y) with x < 3150

A very short description of the algorithms referred to by the graphs:

GMP-5.0
Current implementation.
basecase
Small modification of the current GMP-5.0 code.
new
Goetgheluck's algorithm[1].
div&con 1
Based on the observation that bin(n,2k)=bin(n,k)*bin(n-k,k)/k$, where n$ is the swing factorial.
div&con 2
Same as above, one more recursion level.
odd_pro
A naïve product of odd components...
The two div&con algorithms are higly experimental, very naïvely implemented and, as far as I know, new.
TODO: implement a general div&con algorithm for any recursion level.

Licence: GNU GPLv3+.

Download: gmp-5.0.1/mpz/fac_ui.c.

References:
[1]Computing Binomial Coefficients by P. Goetgheluck, The American Mathematical Monthly, Vol. 94, No. 4 (April 1987), pp. 360-365.
[2]GMP - The GNU Multi Precision library

Double factorial, and primorial implemented at mpz level

The new code for was inserted again in the same file. It can be tested by compiling it, I use the following line
gcc -O2 -DMAIN -Impz -Itune fac_ui.c .libs/libgmp.a tune/.libs/libspeed.a -lrt -o fac-main .

When invoked by fac-main -d, the program will test the correctness of the code for the double factorial.
When invoked by fac-main -p, the program will test the correctness of the code for the primorial.
TODO: the code is in an early developement state, nevertheless it should be asymptotically fast.

graph comparing times and sizesLog/log scale time - size comparison
graph showing size of the resultsLog/log scale size of the results
graph showing timings for various functions related to factorialLog/log scale timings for the functions

Licence: GNU AGPLv3+.

Download: gmp-5.0.2/mpz/fac_ui.c.

References:
[*]GMP - The GNU Multi Precision library


Marco Bodrato - 20 gennaio 2012