Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.2
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      What seems to be happening is that we are substituting incorrectly the fonts for the OCR layer with a font that doesn't have the same width as the one used to generate the PDF. I've attached a screen shot which introduces an alpha value into the renderting stack so you can see the OCR text behind the image text.

        Activity

        Hide
        Patrick Corless added a comment -

        Finally had a change to take a close look at this selection issue. The PDF in question expose a small bug in a context parser where we where concatenating the horizontal text scaling number against the previous value. So if more then one "Tz" was specified per text block we would gradually shrink the text.

        For example

        81 Tz
        65 Tz

        First scale is 81% of the font width, followed by 65% of the previous value. The correct handling of this is to treat each as separate scales. Once the logic was adjust the text selection seem to correspond more directly with the original graphic/ocr capture.

        Took a while to find this one.

        Show
        Patrick Corless added a comment - Finally had a change to take a close look at this selection issue. The PDF in question expose a small bug in a context parser where we where concatenating the horizontal text scaling number against the previous value. So if more then one "Tz" was specified per text block we would gradually shrink the text. For example 81 Tz 65 Tz First scale is 81% of the font width, followed by 65% of the previous value. The correct handling of this is to treat each as separate scales. Once the logic was adjust the text selection seem to correspond more directly with the original graphic/ocr capture. Took a while to find this one.

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: