ICEpdf
  1. ICEpdf
  2. PDF-336

Improve read support for handling files with incorrect xref offsets

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.2
    • Fix Version/s: 5.0.0 alpha1, 5.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      The PDF in question has several Errors in the byte offsets defined in the xref table. Generally speaking ICEpdf should detect the error and start a full document parse, locating all the objects in the file. For some reason the fall back code isn't getting executed and page loading fails. It should be noted that acrobat displays a few errors when loading the document.

      Tasks for this bug.
      - figure out why the fallback code isn't executing
      - consider an parser enhancement, instead of parsing the whole file into memory on failure, we should consider updating the xref table with new byte offset found by parsing file, but not keeping the objects in memory.
      1. graph.pdf
        638 kB
        Patrick Corless
      2. linearized.pdf
        3.07 MB
        Patrick Corless

        Activity

        Hide
        Patrick Corless added a comment -

        Xref problems.

        Show
        Patrick Corless added a comment - Xref problems.
        Hide
        Patrick Corless added a comment -

        This represent quite a big block of work and would be really nice feature. Moving to 5.0.

        Show
        Patrick Corless added a comment - This represent quite a big block of work and would be really nice feature. Moving to 5.0.
        Hide
        Patrick Corless added a comment -

        Changes made for PDF-410, have correct the load issues with graph.pdf. The document Linerized.pdf is still a problem.

        Show
        Patrick Corless added a comment - Changes made for PDF-410 , have correct the load issues with graph.pdf. The document Linerized.pdf is still a problem.
        Hide
        Patrick Corless added a comment -

        Quite a bit of work has already been done to improve how we handle file that have incorrect xrefs. Under the new model if the catalog can not be found we fall back to linear read that also keeps track of the actual object offsets and rebuilds the xref table. Once the xref table is rebuild it's possible to re-fetch the objects lazily as the Page references are garbage collected.

        Of the attached documents the graph.pdf no loads correctly but there are some logged exceptions that need to be cleaned up. The second pdf still does not load so I'll used it to review our fall back logic.

        Show
        Patrick Corless added a comment - Quite a bit of work has already been done to improve how we handle file that have incorrect xrefs. Under the new model if the catalog can not be found we fall back to linear read that also keeps track of the actual object offsets and rebuilds the xref table. Once the xref table is rebuild it's possible to re-fetch the objects lazily as the Page references are garbage collected. Of the attached documents the graph.pdf no loads correctly but there are some logged exceptions that need to be cleaned up. The second pdf still does not load so I'll used it to review our fall back logic.
        Hide
        Patrick Corless added a comment -

        Updated catalog initialization to try an force any objects offset errors to show up while we still have a change to fall back to a linear traversal and subsequent reindex of the cross reference table.

        Show
        Patrick Corless added a comment - Updated catalog initialization to try an force any objects offset errors to show up while we still have a change to fall back to a linear traversal and subsequent reindex of the cross reference table.

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: