Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.2.1
-
Fix Version/s: 4.2.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:any
-
ICEsoft Forum Reference:
-
Workaround Exists:Yes
-
Workaround Description:Use page.getViewText() instead of page.getText() which insures page is fully parse before text is extracted.
Description
A forum user has identified a bug with missing text when calling page.getText(). This method is used in the RI for the TextExtractionTask, it turns out that the "optimized" text extraction call is not initializing PDF XForm object and thus missing quite a bit of content during the extraction.
The Content parser method parseTextBlocks() needs to be updated to insure the xform objects are correctly initalizied and parsed.
The Content parser method parseTextBlocks() needs to be updated to insure the xform objects are correctly initalizied and parsed.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #25282 | Fri Aug 19 11:53:50 MDT 2011 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java
|