[PDF-317] Page.getText() is not returning all page text - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.2.1
Fix Version/s: 4.2.2
Component/s: Core/Parsing
Labels:
None
Environment:
any

ICEsoft Forum Reference:
http://www.icefaces.org/JForum/posts/list/0/19507.page
Workaround Exists:

Yes
Workaround Description:
Use page.getViewText() instead of page.getText() which insures page is fully parse before text is extracted.

Description

A forum user has identified a bug with missing text when calling page.getText(). This method is used in the RI for the TextExtractionTask, it turns out that the "optimized" text extraction call is not initializing PDF XForm object and thus missing quite a bit of content during the extraction.

The Content parser method parseTextBlocks() needs to be updated to insure the xform objects are correctly initalizied and parsed.

Activity

Ascending order - Click to sort in descending order

Patrick Corless created issue - 18/Jul/11 8:23 AM

Patrick Corless made changes - 19/Aug/11 2:01 PM

Field	Original Value	New Value
Status	Open [ 1 ]	Resolved [ 5 ]
Resolution		Fixed [ 1 ]

Ken Fyten made changes - 29/Mar/12 11:42 AM

Status

Resolved [ 5 ]

Closed [ 6 ]

People

Assignee:

Patrick Corless

Reporter:

Patrick Corless

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

18/Jul/11 8:23 AM

Updated:

29/Mar/12 11:42 AM

Resolved:

19/Aug/11 2:01 PM