**M4RI** is a library for fast arithmetic with dense matrices over $\mathbb{F}_2$. It was started by Gregory Bard, is maintained by Martin Albrecht. Several people contributed to it (see below). The name M4RI comes from the first implemented algorithm: The “Method of the Four Russians” inversion algorithm published by Gregory Bard. This algorithm in turn is named after the “Method of the Four Russians” multiplication algorithm which is probably better referred to as Kronrod's method. M4RI is used by the Sage mathematics software and the PolyBoRi library. M4RI is available under the General Public License Version 2 or later (GPLv2+).

**M4RIE** is a library for fast arithmetic with dense matrices over $\mathbb{F}_{2^e}$ for $2 \leq e \leq 10$. It was started and is currently maintained by Martin Albrecht. The name stems from the fact that is relies heavily on M4RI. M4RI is will be included in the Sage mathematics software in the near future. M4RIE is available under the General Public License Version 2 or later (GPLv2+).

Features of the **M4RI** library include:

- basic arithmetic with dense matrices over $\mathbb{F}_2$ (addition, equality testing, stacking, augmenting, sub-matrices, randomisation, etc.),
- asymptotically fast $\mathcal{O}(n^{\log_27})$ matrix multiplication via the “Method of the Four Russians” (M4RM) & the Strassen-Winograd algorithm,
- fast row echelon form computation and matrix inversion via the “Method of the Four Russians” (M4RI, average case $\mathcal{O}(n^3/\log n)$),
- asymptotically fast $\mathcal{O}(n^{\log_27})$ PLE factorisation by reduction to matrix multiplication and a fast base-case implementation;
- asymptotically fast TRiangular System solving with Matrices (upper left, lower left, upper right, lower right),
- support for the x86/x86_64 SSE2 instruction set where available,
- preliminary support for parallisation on shared memory systems via OpenMP and
- support for Linux and OS X, Solaris 10 (all GCC).

Features of the **M4RIE** library include:

- basic arithmetic with dense matrices over $\mathbb{F}_{2^e}$ (addition, equality testing, stacking, augmenting, sub-matrices, etc.),
- asymptotically fast $\mathcal{O}(n^{\log_27})$ matrix multiplication over $\mathbb{F}_{2^e}$ with $2 \leq e \leq 8$ using the Karatsuba trick due to Boothby and Bradshaw,
- asymptotically fast $\mathcal{O}(n^{\log_27})$ matrix multiplication using “Newton-John Table” based multiplication & the Strassen-Winograd algorithm,
- fast row echelon form computation and matrix inversion via a “Newton-John Table” based algorithm using only $\mathcal{O}(n^2)$ field multiplications and $\mathcal{O}(n^3)$ field additions,
- asymptotically fast row echelon form computation and matrix inversion via PLE decomposition over $\mathbb{F}_{2^e}$ with $2 \leq e \leq 8$,
- asymptotically fast TRiangular System solving with Matrices (upper left, lower left),
- support for Linux, OpenSolaris and OS X (all GCC).

At least the following people have contributed to the M4RI library.

**Tim Abbott**: Debian-isation & advice on correct libtool versioning;**Martin Albrecht**: maintainer, release manager, peformance tuning (M4RM, M4RI, Strassen, PLE), initial M4RM implementation, parallelisation, PLE factorisation (MMPF algorithm);**Gregory Bard**: initial author, M4RI algorithm and initial implementation;**Marco Bodrato**: new Strassen-like sequence for matrix multiplication and squaring which improves performance for squaring;**Michael Brickenstein**: PolyBoRi author, standard conformity contributions for ANSIC, test data, discussion/suggestion of performance improvements, fast vector-matrix products;**Alexander Dreyer**: PolyBoRi author, standard conformity contributions for ANSIC;**Jean-Guillaume Dumas**: linear system resolution;**William Hart**: many performance improvements for matrix multiplication and in general;**David Harvey**: parallel parity function used in classical multiplication;**Jerry James**: bug fixes, dealing with compiler warnings, Fedora Linux packaging;**David Kirkby**: portability issues (Solaris, HP Unix);**Clément Pernet**: PLE factorisation, triangular system solving (TRSM);**Wael Said**: test cases, feedback;**Carlo Wood**: bit-level optimisation (transpose, column swaps), refactoring, benchmark(et)ing framework, test code, build system clean-up;

We are thankful to **William Stein** for providing our hosting and general infrastructure.

Since M4RI and M4RIE are included in Sage, here is an online demo of the two libraries

See Further Reading.

Please contact our mailinglist if there are bugs, questions, comments.

2014/09/14
A new version of M4RI and M4RIE is available for download. The biggest change is that `A->offset` was dropped. Also, various small (multicore) performance improvements were implemented. The update for M4RIE is to maintain compatibility with M4RI. A few improvements were implemented for the mzd_poly module as well.

2013/04/16 A new version of M4RI is available for download. A detailed changlog is available here for M4RI.

2012/12/21 A new version of M4RI is available for download. A detailed changlog is available here for M4RI. See also this blog post for details.

2012/06/13 New versions of both M4RI and M4RIE are available for download. A detailed changlog are available here for M4RI.

2012/04/13 New versions of both M4RI and M4RIE are available for download. Detailed changlogs are available here for M4RI and here for M4RIE.

2011/12/04 New versions of both M4RI and M4RIE are available for download. The highlight of this version for M4RI is support for reading and writing 1-bit PNG images. The highlight of this release of M4RIE is much improved performance for $4 < e \leq 8$. Detailed changlogs are available here for M4RI and here for M4RIE.

2011/11/30
A technical report by Martin R. Albrecht is available describing the M4RIE library. In particular, Newton-John tables are introduced and our implementation of Karatsuba based matrix-matrix multiplication is described:

**The M4RIE library for dense linear algebra over small fields with even characteristic**

**Abstract:** In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over $\mathbb{F}_{2^e}$ for $2 \leq e \leq 10$. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over $\mathbb{F}_{2^e}$ for $e$ small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over $\mathbb{F}_{p^k}$ to a series of matrix multiplications over $\mathbb{F}_p$. Furthermore, we propose a caching technique - Newton-John tables - to avoid finite field multiplications which is inspired by Kronrod's method ("M4RM") for matrix multiplication over $\mathbb{F}_2$. Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over $\mathbb{F}_{2^e}$ with $2 \leq e \leq 10$.

2011/11/29
A technical report by Martin R. Albrecht, Gregory Bard and Clément Pernet is available describing the Gaussian elimination machinery (PLE decomposition) in the M4RI library:

**Efficient Dense Gaussian Elimination over the Finite Field with Two Elements.**

**Abstract:** In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements. We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.

2011/10/10 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. See the release notes for the list of changes.

2011/07/14 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. M4RIE now relies on M4RI for cache size and other hardware feature detection.

2011/06/10 A new release of M4RI is available for download. This version fixes various issues when M4RI is built with OpenMP enabled.

2011/06/01 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. The only changes to M4RIE are to ensure compatibility with M4RI version 20110601 and up.

2011/04/13 We now have a mailinglist.

2010/08/14 A new release of M4RI is available for download. The main changes are improved automatic cache size detection and some clean ups necessary for M4RIE. A first official release of M4RIE is also available for download.

2010/07/13 A new release is available for download. See the release notes for details.

2009/11/04 A new release is available for download. See the release notes for details.

2009/04/09 A new release is available for download. It heavily breaks backward compatibility but supports much bigger matrices than before. See the release notes for details.

2009/01/05 A new release is available for download. It contains new features, performance enhancements and bug fixes. Release notes are available in the wiki.

2008/11/12 A paper describing our matrix multiplication implementation is available as pre-print on the ArXiv. Also, M4RI is being packaged for Fedora Core. Finally, we updated the peformance data for GAP and Magma on the Core 2 Duo with improved timings.

2008/10/28 A new release is available for download. It contains mainly bugfixes but starting with this release triangular solving with matrices (TRSM) is fully supported. Also LUP factorisation (i.e. on full rank matrices) seems to be working now but it is not optimised at all.

2008/10/22 The slides for the Sage Days 10 talk about matrix multiplication in the M4RI library are available online.

2008/09/22 A new release is available. It is identical to the version of M4RI shipped with Sage 3.1.2 and contains many build fixes for a wide range of platforms. Sage (and thus M4RI) supportes x86 Linux, x86_64 Linux, ia64 Linux, x86 OSX and ppc OSX. M4RI also supports Windows and Solaris 10.

2008/08/26 This release is a pure bugfix release. Before this bugfix, if the input matrices were very non-square either wrong results or SIGSEGVs could be observed.

2008/08/21 A new release is available. This release contains Clément Pernet's latest LQUP and TRSM development code. LQUP still lacks a basecase but TRSM should be fairly complete. No attempts were made so far to optimise things. Furthermore, this release contains an improved strategy for choosing $k$ in M4RM which improves performance on the Core2Duo.

2008/08/17 A new release is available. This release adds a simple memory manager for systems with slow malloc/free syscalls. Also, the initialisation (m4ri_init) and finalisation (m4ri_fini) routines are now called automatically when the library is loaded/unloaded. This is tested with GCC and SunCC but not with MSVC. Matrix elimination got slightly faster across plattforms, multi-core support was extended to elimination and improved for multiplication. The README contains instruction how to enable multi-core support. This release does not contain Clément Pernet's latest LQUP patch.

2008/06/24 A new release is available. This release uses the libtool -release mechanism to ensure binary (in)compatibility between releases since - again - the API changed: since the project is quite young do not expect the API to be stable anytime soon. Also the new version attempts to detect the L1 and L2 cache sizes and uses a Strassen-Winograd cutoff by default such that both source matrices fit in L2 (this is not optimal but a good compromise). This new version has some scratch/experimental code which is the beginning of asymptotically fast LQUP factorisation. Finally, elimination got slightly faster.

2008/06/20 Thanks to Tim Abbott libM4RI is now in Debian/unstable.

2008/06/13 It turns out our comparison with Magma on the Core2Duo was strongly biased, since we compared with a version of Magma that was optimised for AMD64 rather than Intel64. The correct times are given now and we apologise for this mix-up.

2008/06/03 This release is a small bugfix release. Matrices are now printed correctly and a bug in mzd_gauss_delayed was reported and fixed by Wael Said and Mohammed Saied.

2008/06/01 This release greatly improves the performance of M4RI: the reduction of a given matrix to (reduced) row echelon form. The speed-up over the last release can be as much as ten, we will provide performance data for this in the near future. However, the new implementation still isn't asymptotically fast. Also mzd_transpose is much faster now due to improved data locality.

2008/05/21 Today's release fixes a severe bug found by Bill Hart, disables SSE2 on all CPUs except those manufactured by Intel (for performance reasons), improves performance on the Core2Duo and introduces a configure switch to enable OpenMP support.

2008/05/20
A new release is available with massive speed improvements for matrix multiplication. These improvements were discussed and tested in this thread on the [sage-devel] mailing list. This release has also experimental and preliminary support for OpenMP. To activate it compile with GCC 4.2 and `CFLAGS="-fopenmp -DHAVE_OPENMP"` Note however, that this release is still a developer preview since some automatic tuning is still not implemented, the performance on the Opteron isn't acceptable yet, and the parallel implementation is naive.

2008/05/16 Release early, release often. This release fixes the unconditional use of _mm_free even when it is not available.

2008/05/15 A new minor release is available which improves performance on Opterons. Also, the website has moved to http://m4ri.sagemath.org.

2008/05/10 Launch of this website.