ICEpdf
  1. ICEpdf
  2. PDF-317

Page.getText() is not returning all page text

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.1
    • Fix Version/s: 4.2.2
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      A forum user has identified a bug with missing text when calling page.getText(). This method is used in the RI for the TextExtractionTask, it turns out that the "optimized" text extraction call is not initializing PDF XForm object and thus missing quite a bit of content during the extraction.

      The Content parser method parseTextBlocks() needs to be updated to insure the xform objects are correctly initalizied and parsed.

        Activity

        Patrick Corless created issue -
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #25282 Fri Aug 19 11:53:50 MDT 2011 patrick.corless PDF-317 updated the text extraction specific context parser to include xobject content extraction.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java
        Hide
        Patrick Corless added a comment -

        Updated the content parsers parseTextBlocks method to include xobject processing. This feels like deja vu as I'm sure I've fixed this in the past. Regardless text extraction is working correctly with the optimization that not all page objects are parsed, such as images and other non related text content.

        Show
        Patrick Corless added a comment - Updated the content parsers parseTextBlocks method to include xobject processing. This feels like deja vu as I'm sure I've fixed this in the past. Regardless text extraction is working correctly with the optimization that not all page objects are parsed, such as images and other non related text content.
        Patrick Corless made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Ken Fyten made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: