ICEpdf
  1. ICEpdf
  2. PDF-317

Page.getText() is not returning all page text

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.1
    • Fix Version/s: 4.2.2
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      A forum user has identified a bug with missing text when calling page.getText(). This method is used in the RI for the TextExtractionTask, it turns out that the "optimized" text extraction call is not initializing PDF XForm object and thus missing quite a bit of content during the extraction.

      The Content parser method parseTextBlocks() needs to be updated to insure the xform objects are correctly initalizied and parsed.

        Activity

        Repository Revision Date User Message
        ICEsoft Public SVN Repository #25282 Fri Aug 19 11:53:50 MDT 2011 patrick.corless PDF-317 updated the text extraction specific context parser to include xobject content extraction.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: