ICEpdf
  1. ICEpdf
  2. PDF-336

Improve read support for handling files with incorrect xref offsets

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.2
    • Fix Version/s: 5.0.0 alpha1, 5.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      The PDF in question has several Errors in the byte offsets defined in the xref table. Generally speaking ICEpdf should detect the error and start a full document parse, locating all the objects in the file. For some reason the fall back code isn't getting executed and page loading fails. It should be noted that acrobat displays a few errors when loading the document.

      Tasks for this bug.
      - figure out why the fallback code isn't executing
      - consider an parser enhancement, instead of parsing the whole file into memory on failure, we should consider updating the xref table with new byte offset found by parsing file, but not keeping the objects in memory.
      1. linearized.pdf
        3.07 MB
        Patrick Corless
      2. graph.pdf
        638 kB
        Patrick Corless

        Activity

        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Patrick Corless made changes -
        Fix Version/s 5.0.0 alpha1 [ 10676 ]
        Patrick Corless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Patrick Corless added a comment -

        Updated catalog initialization to try an force any objects offset errors to show up while we still have a change to fall back to a linear traversal and subsequent reindex of the cross reference table.

        Show
        Patrick Corless added a comment - Updated catalog initialization to try an force any objects offset errors to show up while we still have a change to fall back to a linear traversal and subsequent reindex of the cross reference table.
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #32602 Tue Dec 04 09:56:34 MST 2012 patrick.corless PDF-336 updated catalog initialization to try and force ny potential xref errors upfront so that the the linear traversal code can properly execute.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/Parser.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Catalog.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/ImageReference.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Document.java
        Hide
        Patrick Corless added a comment -

        Quite a bit of work has already been done to improve how we handle file that have incorrect xrefs. Under the new model if the catalog can not be found we fall back to linear read that also keeps track of the actual object offsets and rebuilds the xref table. Once the xref table is rebuild it's possible to re-fetch the objects lazily as the Page references are garbage collected.

        Of the attached documents the graph.pdf no loads correctly but there are some logged exceptions that need to be cleaned up. The second pdf still does not load so I'll used it to review our fall back logic.

        Show
        Patrick Corless added a comment - Quite a bit of work has already been done to improve how we handle file that have incorrect xrefs. Under the new model if the catalog can not be found we fall back to linear read that also keeps track of the actual object offsets and rebuilds the xref table. Once the xref table is rebuild it's possible to re-fetch the objects lazily as the Page references are garbage collected. Of the attached documents the graph.pdf no loads correctly but there are some logged exceptions that need to be cleaned up. The second pdf still does not load so I'll used it to review our fall back logic.
        Hide
        Patrick Corless added a comment -

        Changes made for PDF-410, have correct the load issues with graph.pdf. The document Linerized.pdf is still a problem.

        Show
        Patrick Corless added a comment - Changes made for PDF-410 , have correct the load issues with graph.pdf. The document Linerized.pdf is still a problem.
        Patrick Corless made changes -
        Salesforce Case []
        Fix Version/s 5.0 [ 10314 ]
        Fix Version/s 4.3 [ 10266 ]
        Hide
        Patrick Corless added a comment -

        This represent quite a big block of work and would be really nice feature. Moving to 5.0.

        Show
        Patrick Corless added a comment - This represent quite a big block of work and would be really nice feature. Moving to 5.0.
        Patrick Corless made changes -
        Attachment graph.pdf [ 13599 ]
        Patrick Corless made changes -
        Field Original Value New Value
        Attachment linearized.pdf [ 13597 ]
        Hide
        Patrick Corless added a comment -

        Xref problems.

        Show
        Patrick Corless added a comment - Xref problems.
        Patrick Corless created issue -

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: