Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0 - Beta
    • Fix Version/s: 4.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      Windows, Mac

      Description

      In the ICEpdf standard, it says that inside of dictionaries, strings can be either in PDFDocEncoding or 16 bit BE (big endian) Unicode. To complicate things, there may be certain dictionary strings that are in UTF-8. That has to be investigated. Right now our parser is just making strings from the bytes, which means we're only handling ASCII correctly. Accented characters using the top 8th bit are not necessarily being handled right. Java defaults to using the platform encoding, so WinAnsi on Windows and MacRoman on the Mac. Have to see what on Linux. Some documentation shows PDFDocEncoding to be similar to, if not the same as Latin1. We have to investigate if there is something in the specification for overriding the PDFDocEncoding default to specify a specific one. Then we need the Parser to use the correct encoding to create the Java strings, so we're not corrupting the inputs.

        Issue Links

          Activity

          Hide
          Mark Collette added a comment -

          Refer to D.2 and D.3 in the PDF 1.7 spec for info on PDFDocEncoding.

          Show
          Mark Collette added a comment - Refer to D.2 and D.3 in the PDF 1.7 spec for info on PDFDocEncoding.
          Hide
          Patrick Corless added a comment -

          Closing, fixed as part of 4.0 incremental updater.

          Show
          Patrick Corless added a comment - Closing, fixed as part of 4.0 incremental updater.

            People

            • Assignee:
              Patrick Corless
              Reporter:
              Mark Collette
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: