See: Description
| Class | Description |
|---|---|
| AbstractSimpleTikaDocumentFactory |
An abstract document factory that provides an implementation for
AbstractSimpleTikaDocumentFactory.getDocument(InputStream, Reference2ObjectMap)
and AbstractSimpleTikaDocumentFactory.fields(). |
| AbstractTikaDocumentFactory |
An abstract document factory that provides the mapping from field names to field indices.
|
| AutoDetectDocumentFactory |
A document factory that automatically detect the type of the document content.
|
| EPUBDocumentFactory |
A document factory for the epub format.
|
| GreedyTikaField |
The set of all Tika metadata represented as a single field inside MG4J.
|
| HtmlDocumentFactory |
A document factory for the HTML format.
|
| MSOfficeDocumentFactory |
A document factory for the Microsoft Office format.
|
| OOXMLDocumentFactory |
A document factory for the OOXML format.
|
| OpenDocumentDocumentFactory |
A document factory for the Open Document format.
|
| PdfDocumentFactory |
A document factory for the PDF format.
|
| RTFDocumentFactory |
A document factory for the RTF format.
|
| TextDocumentFactory |
A document factory for the text format; the character set will be autodetected.
|
| TikaField |
A Tika field represented inside MG4J.
|
| XMLDocumentFactory |
A document factory for XML.
|
AutoDetectDocumentFactory or any other factory in which
metadata fields are user-definable or otherwise variable, it is impossible to
provide a static listing of all available fields, as they depend on the
actual factory used to parse the document. In this case, an instance of
a GreedyTikaField is used to return some useful data to the caller
by (essentially) concatenating the string representations of all metadata fields.