WARNING: Release 4.0 has minor binary incompatibilities with previous releases, mainly due to the move from the interface it.unimi.dsi.util.LongBigList to the now standard it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were all modified to fit the new interface. It comes in two versions: the standard version and the big version, which supports >231 terms and documents. Please read our (short) "Moving Java to Big Data" document for details.

Introduction

MG4JMG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

The main points of MG4J are:

The starting point for understanding MG4J is a look at the tutorial, which explains how to index a sample collection and query the newly constructed index from the command line or using a browser. Then, the Javadoc class documentation can provide more insights.

MG4J is free software distributed under the GNU Lesser General Public License. If you find MG4J useful, we kindly ask you to quote the following reference:

@INPROCEEDINGS{BoVTREC2005,
        title = "{M}{G}4{J} at {T}{R}{E}{C} 2005",
        author="Paolo Boldi and Sebastiano Vigna",
        year = 2005,
        booktitle = "The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings",
        editor = "Ellen M. Voorhees and Lori P. Buckland",
        publisher = "NIST",
        series = "Special Publications",
        number = "SP 500-266",
	note = "\texttt{\small http://mg4j.dsi.unimi.it/}",
}

Why Java?

Writing in Java code that (essentially) has to roll bits over and over may seem a Bad Thing™. However, one should take into consideration the following points:

Installation

InstallYou can grab MG4J from Maven Central. Otherwise, you just have to install the .jar file coming with the distribution and the dependencies, which are gathered for your convenience in a tarball.

Citations

Here you can find (in no particular order) research papers that have been written using MG4J. The list is not exhaustive, and we will be happy to include works that are missing.