Class OldExcelExtractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, POITextExtractor

    public class OldExcelExtractor
    extends java.lang.Object
    implements POITextExtractor
    A text extractor for old Excel files, which are too old for HSSFWorkbook to handle. This includes Excel 95, and very old (pre-OLE2) Excel files, such as Excel 4 files.

    Returns much (but not all) of the textual content of the file, suitable for indexing by something like Apache Lucene, or used by Apache Tika, but not really intended for display to the user.

    • Constructor Detail

      • OldExcelExtractor

        public OldExcelExtractor​(java.io.InputStream input)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • OldExcelExtractor

        public OldExcelExtractor​(java.io.File f)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • OldExcelExtractor

        public OldExcelExtractor​(POIFSFileSystem fs)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • OldExcelExtractor

        public OldExcelExtractor​(DirectoryNode directory)
                          throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • getBiffVersion

        public int getBiffVersion()
        The Biff version, largely corresponding to the Excel version
        Returns:
        the Biff version
      • getText

        public java.lang.String getText()
        Retrieves the text contents of the file, as best we can for these old file formats
        Specified by:
        getText in interface POITextExtractor
        Returns:
        the text contents of the file
      • handleNumericCell

        protected void handleNumericCell​(java.lang.StringBuilder text,
                                         double value)
      • getMetadataTextExtractor

        public POITextExtractor getMetadataTextExtractor()
        Description copied from interface: POITextExtractor
        Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.
        Specified by:
        getMetadataTextExtractor in interface POITextExtractor
        Returns:
        the metadata and text extractor
      • getFilesystem

        public java.io.Closeable getFilesystem()
        Specified by:
        getFilesystem in interface POITextExtractor
        Returns:
        The underlying resources/filesystem
      • getDocument

        public java.lang.Object getDocument()
        Specified by:
        getDocument in interface POITextExtractor
        Returns:
        the processed document