public class HtmlDocumentFactory extends AbstractSimpleTikaDocumentFactory
The metadata that will be tentatively parsed is Metadata.TITLE.
PropertyBasedDocumentFactory.MetadataKeysDocumentFactory.FieldTypedefaultMetadata| Constructor and Description |
|---|
HtmlDocumentFactory() |
HtmlDocumentFactory(Properties properties) |
HtmlDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata) |
HtmlDocumentFactory(String[] property) |
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
|
protected List<TikaField> |
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.
|
copy, fields, getDocument, parsePropertyfieldIndex, fieldName, fieldType, numberOfFieldsensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKeyensureFieldIndex, toStringpublic HtmlDocumentFactory()
public HtmlDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
public HtmlDocumentFactory(Properties properties) throws ConfigurationException
ConfigurationExceptionpublic HtmlDocumentFactory(String[] property) throws ConfigurationException
ConfigurationExceptionprotected org.apache.tika.parser.Parser getParser()
AbstractSimpleTikaDocumentFactorygetParser in class AbstractSimpleTikaDocumentFactoryprotected List<TikaField> metadataFields()
AbstractSimpleTikaDocumentFactorymetadataFields in class AbstractSimpleTikaDocumentFactory