public class PdfDocumentFactory extends AbstractSimpleTikaDocumentFactory
The metadata that will be tentatively parsed are
Metadata.TITLE, MSOffice.AUTHOR, Metadata.CREATOR,
MSOffice.KEYWORDS, Metadata.SUBJECT, producer, created,
trapped, and HttpHeaders.LAST_MODIFIED.
PropertyBasedDocumentFactory.MetadataKeysDocumentFactory.FieldTypedefaultMetadata| Constructor and Description |
|---|
PdfDocumentFactory() |
PdfDocumentFactory(Properties properties) |
PdfDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata) |
PdfDocumentFactory(String[] property) |
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.tika.parser.Parser |
getParser()
The parser to be used to parse this kind of documents; subclasses should return always the same instance, as Tika parsers are immutable and thread-safe.
|
protected List<TikaField> |
metadataFields()
The list of Tika fields (apart for content) that this factory provides; it returns the empty list, so most subclasses may want to override this method.
|
copy, fields, getDocument, parsePropertyfieldIndex, fieldName, fieldType, numberOfFieldsensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKeyensureFieldIndex, toStringpublic PdfDocumentFactory()
public PdfDocumentFactory(Properties properties) throws ConfigurationException
ConfigurationExceptionpublic PdfDocumentFactory(Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
public PdfDocumentFactory(String[] property) throws ConfigurationException
ConfigurationExceptionprotected org.apache.tika.parser.Parser getParser()
AbstractSimpleTikaDocumentFactorygetParser in class AbstractSimpleTikaDocumentFactoryprotected List<TikaField> metadataFields()
AbstractSimpleTikaDocumentFactorymetadataFields in class AbstractSimpleTikaDocumentFactory