Introduction

MG4JMG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

Warning: Release 2.1 moved several utility classes to DSI utilities. As of 3.0, old deprecated versions have been removed.

The main points of MG4J are:

The starting point for understanding MG4J is a look at the tutorial, which explains how to index a sample collection and query the newly constructed index from the command line or using a browser. Then, the Javadoc class documentation can provide more insights.

MG4J is free software distributed under the GNU Lesser General Public License. If you find MG4J useful, we kindly ask you to quote the following reference:

@INPROCEEDINGS{BoVTREC2005,
        title = "{M}{G}4{J} at {T}{R}{E}{C} 2005",
        author="Paolo Boldi and Sebastiano Vigna",
        year = 2005,
        booktitle = "The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings",
        editor = "Ellen M. Voorhees and Lori P. Buckland",
        publisher = "NIST",
        series = "Special Publications",
        number = "SP 500-266",
	note = "\texttt{\small http://mg4j.dsi.unimi.it/}",
}

Why Java?

Writing in Java code that (essentially) has to roll bits over and over may seem a Bad Thing™. However, one should take into consideration the following points:

Installation

InstallFor a quick start, you just have to install the .jar file coming with the distribution and the dependencies, which are gathered for your convenience in a tarball.

A more detailed list of the dependencies can be found in the overview of the Javadoc documentation. There are also JPackage RPMs.