[PDF-130] CPU optimisations for numerous shapes - ICEsoft JIRA Issue Tracker

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0 - Beta
Fix Version/s: 4.0
Component/s: Core/Parsing
Labels:
None
Environment:
Win XP SP3, Sun jdk1.5.0_19 and jdk1.6.0_14, Intel Core 2 Duo 2.13 GHz, 2 GB RAM.

Description

After doing ~~PDF-127~~ memory optimisations, now would be a good time to do some CPU optimisations. I noticed that on the Android website they gave a list of optimisation recommendations, so figured I'd try out the one where you access class instance fields via a cached stack reference. Patrick pointed out that in PDFs with a large number of shapes, we spend a disproportionate amount of time growing the shapes vector.

All testing was done on \\iceads1\Public\ICEpdf Development\QA\transitmap_2004.pdf which contains over 566,000 shapes. We parse it into a single shapes vector. I ran all the tests under a modified version of the tagging code from PDF-114, where it would only load that one PDF and go through the pages, of which there is only one, and init() the page. The file is loaded locally, from my hard-drive, and the milliseconds are reported from load to dispose.

Commencing tagging file 1439 of 1532
D:\ICEpdf\iceads1__Public__ICEpdf_Development\QA\transitmap_2004.pdf

I ran the test 4x in a row, and discarded the first run, since that would reflect system caching more than anything. This is the baseline:

Duration: 54.531 seconds
Duration: 54.640 seconds
Duration: 54.890 seconds

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Mark Collette added a comment - 13/Jan/10 6:58 PM - edited

For the first optimisation attempt, I wanted to alter Parser and ContentParser, to use stack references instead of class field references.

After altering Parser, we can see absolutely zero difference:

Duration: 54.671 seconds
Duration: 54.499 seconds
Duration: 54.765 seconds
Duration: 54.577 seconds

After altering Parser, we can see absolutely zero difference:

Duration: 54.734 seconds
Duration: 54.718 seconds

So, that Android optimisation just doesn't apply to a Sun Java 1.6 Intel Core 2 Duo system. I didn't bother with any other Android suggestions, after that.

Show

Mark Collette added a comment - 13/Jan/10 6:58 PM - edited For the first optimisation attempt, I wanted to alter Parser and ContentParser, to use stack references instead of class field references. After altering Parser, we can see absolutely zero difference: Duration: 54.671 seconds Duration: 54.499 seconds Duration: 54.765 seconds Duration: 54.577 seconds After altering Parser, we can see absolutely zero difference: Duration: 54.734 seconds Duration: 54.718 seconds So, that Android optimisation just doesn't apply to a Sun Java 1.6 Intel Core 2 Duo system. I didn't bother with any other Android suggestions, after that.

Hide

Permalink

Mark Collette added a comment - 13/Jan/10 7:05 PM - edited

I didn't do this optimisation next, but I'll report it next to simplify the numbers. I noticed in the profiler a certain amount of time going to creating StringBuffers. I decided that I would replace all StringBuffers with StringBuilders, and see if the removal of synchronisation overhead would make a difference.

Duration: 53.140 seconds

So, about a second and a half. With the next optimisation ,of changing the shapes vector growth, I saw that this StringBuilder change averages close to 2 seconds change. Enough that the next step will be to try the same with changing all Vectors to ArrayLists, and then maybe all Hashtables to HashMaps. And by all, I mean all the places it makes sense, given our threading strategy.

Show

Mark Collette added a comment - 13/Jan/10 7:05 PM - edited I didn't do this optimisation next, but I'll report it next to simplify the numbers. I noticed in the profiler a certain amount of time going to creating StringBuffers. I decided that I would replace all StringBuffers with StringBuilders, and see if the removal of synchronisation overhead would make a difference. Duration: 53.140 seconds So, about a second and a half. With the next optimisation ,of changing the shapes vector growth, I saw that this StringBuilder change averages close to 2 seconds change. Enough that the next step will be to try the same with changing all Vectors to ArrayLists, and then maybe all Hashtables to HashMaps. And by all, I mean all the places it makes sense, given our threading strategy.

Hide

Permalink

Mark Collette added a comment - 13/Jan/10 7:32 PM - edited

The numbers get a little confusing, because some tests were done before the StringBuilder change and some after, so I'll report them in separate blocks.

First off, I investigated how many shapes there were, which is 566,693. Then I found that we were using an initialCapacity of 1000 and a capacityIncrement of 50. That means that the capacity would start at 1000, and grow by 50. Typical behaviour of Vector is to double, and for ArrayList to go up by 1.5x of the previous capacity, if no capacityIncrement is specified. So we were reallocating the backing array, and copying the contents around 10,000 times. I investigated 3 main scenarios: a theoretical best case of immediately setting capacity to 567,000 so as to never reallocated and recopy; removing the capacityIncrement parameter, so as to do the default doubling behaviour, and reallocate and copy 10 times; and setting the capacityIncrement to 1000, so that we'd reallocate and copy 576 times, but would avoid over-shooting the desired capacity by any more than 1000.

These results were before the StringBuilder change, and are from an average base-line of 54.687 seconds:

ensureCapacity(567000):
Duration: 11.859 seconds

No capacityIncrement, so would double:
Duration: 11.844 seconds

Set capacityIncrement to 1000:
Duration: 15.078 seconds

A huge difference, with the doubling strategy being indistinguishable from the theoretically optimal strategy. The overcapacity-avoiding strategy of using a capacityIncrement of 1000 is still a great improvement, but just far enough behind, as to make it debatable about the trade-offs.

Further testing of capacityIncrement values of 2000, 4000, 8000 showed a linear shift towards the doubling time. This showed that to get the optimal speed, one would have to accept larger and larger potential over-capacity, while simultaneously making me feel that much of any over-capacity would be undesirable. It made me consider setting the initialCapacity down to 500.

After making the StringBuilder change, the times were still about 3 seconds apart:

No capacityIncrement, so would double:
Duration: 9.734 seconds

Set capacityIncrement to 1000:
Duration: 12.984 seconds

Finally I have adopted a strategy of no capacityIncrement, so it will double in capacity, but after the Shapes are done being parsed, if the capacity is greater than the size by at least 200, then it would recreate the Vector with the exact necessary capacity.

Duration: 9.625 seconds

As good as the optimal time, might spike in memory with temporary over-capacity, but not more than 1 MB, and then it will have optimal capacity, and all with very simple code.

Show

Mark Collette added a comment - 13/Jan/10 7:32 PM - edited The numbers get a little confusing, because some tests were done before the StringBuilder change and some after, so I'll report them in separate blocks. First off, I investigated how many shapes there were, which is 566,693. Then I found that we were using an initialCapacity of 1000 and a capacityIncrement of 50. That means that the capacity would start at 1000, and grow by 50. Typical behaviour of Vector is to double, and for ArrayList to go up by 1.5x of the previous capacity, if no capacityIncrement is specified. So we were reallocating the backing array, and copying the contents around 10,000 times. I investigated 3 main scenarios: a theoretical best case of immediately setting capacity to 567,000 so as to never reallocated and recopy; removing the capacityIncrement parameter, so as to do the default doubling behaviour, and reallocate and copy 10 times; and setting the capacityIncrement to 1000, so that we'd reallocate and copy 576 times, but would avoid over-shooting the desired capacity by any more than 1000. These results were before the StringBuilder change, and are from an average base-line of 54.687 seconds: ensureCapacity(567000): Duration: 11.859 seconds No capacityIncrement, so would double: Duration: 11.844 seconds Set capacityIncrement to 1000: Duration: 15.078 seconds A huge difference, with the doubling strategy being indistinguishable from the theoretically optimal strategy. The overcapacity-avoiding strategy of using a capacityIncrement of 1000 is still a great improvement, but just far enough behind, as to make it debatable about the trade-offs. Further testing of capacityIncrement values of 2000, 4000, 8000 showed a linear shift towards the doubling time. This showed that to get the optimal speed, one would have to accept larger and larger potential over-capacity, while simultaneously making me feel that much of any over-capacity would be undesirable. It made me consider setting the initialCapacity down to 500. After making the StringBuilder change, the times were still about 3 seconds apart: No capacityIncrement, so would double: Duration: 9.734 seconds Set capacityIncrement to 1000: Duration: 12.984 seconds Finally I have adopted a strategy of no capacityIncrement, so it will double in capacity, but after the Shapes are done being parsed, if the capacity is greater than the size by at least 200, then it would recreate the Vector with the exact necessary capacity. Duration: 9.625 seconds As good as the optimal time, might spike in memory with temporary over-capacity, but not more than 1 MB, and then it will have optimal capacity, and all with very simple code.

Hide

Permalink

Mark Collette added a comment - 18/Jan/10 1:09 PM

Taking the StringBuffer to StringBuilder conversion concept, and expanding on it, I decided to investigate changing our use of Vector to ArrayList. This was more complicated though, since ArrayList didn't support Vector's elementAt, removeElementAt, elements(), capcity(), setSize methods. Some had equivalents in ArrayList, and other required alternative approaches.

After the conversion, I ran the same test as the rest of this jira, and found a negligible difference in execution time. Less than 3% improvement. The changes affected a lot of public APIs, so that made it seem not worth the minuscule improvement, to break backwards compatibility for our users. So, I reverted the ArrayList changes back out. I ended up keeping some of the small changes, where I optimised redundant Vector accesses, and where I simplified iteration by using the Java 1.5 for loop.

Show

Mark Collette added a comment - 18/Jan/10 1:09 PM Taking the StringBuffer to StringBuilder conversion concept, and expanding on it, I decided to investigate changing our use of Vector to ArrayList. This was more complicated though, since ArrayList didn't support Vector's elementAt , removeElementAt , elements(), capcity(), setSize methods. Some had equivalents in ArrayList, and other required alternative approaches. After the conversion, I ran the same test as the rest of this jira, and found a negligible difference in execution time. Less than 3% improvement. The changes affected a lot of public APIs, so that made it seem not worth the minuscule improvement, to break backwards compatibility for our users. So, I reverted the ArrayList changes back out. I ended up keeping some of the small changes, where I optimised redundant Vector accesses, and where I simplified iteration by using the Java 1.5 for loop.

Hide

Permalink

Mark Collette added a comment - 18/Jan/10 1:15 PM

StringBuilder and Shapes changes in the core
Subversion 20266

StringBuilder changes in the font library
Subversion 22240
Subversion 22241

Show

Mark Collette added a comment - 18/Jan/10 1:15 PM StringBuilder and Shapes changes in the core Subversion 20266 StringBuilder changes in the font library Subversion 22240 Subversion 22241

Hide

Permalink

Mark Collette added a comment - 19/Jan/10 2:21 PM

More investigative profiling showed that a bunch of time was being used when reading from the input stream, when doing the parsing, so I experimented with several buffering schemes, to try to reduce that. One was to load the document, using a ByteArraySeekableInput, instead of the RandomAccessFileSeekableInput, to effectively have it load completely from RAM instead of disk. The intent being to determine if the best-case scenario of fully from RAM would make enough of a difference in the parsing times, to make buffering the RandomAccessFileSeekableInput worth-while. It didn't improve times much at all, probably due to the effective use of a BufferedInputStream in the parsing code. I tried further tuning the BufferedImputStream's size, but that didn't help. I looked into our SequenceInputStream, which is used for the parsing, but couldn't identify any way of speeding it up.

Show

Mark Collette added a comment - 19/Jan/10 2:21 PM More investigative profiling showed that a bunch of time was being used when reading from the input stream, when doing the parsing, so I experimented with several buffering schemes, to try to reduce that. One was to load the document, using a ByteArraySeekableInput, instead of the RandomAccessFileSeekableInput, to effectively have it load completely from RAM instead of disk. The intent being to determine if the best-case scenario of fully from RAM would make enough of a difference in the parsing times, to make buffering the RandomAccessFileSeekableInput worth-while. It didn't improve times much at all, probably due to the effective use of a BufferedInputStream in the parsing code. I tried further tuning the BufferedImputStream's size, but that didn't help. I looked into our SequenceInputStream, which is used for the parsing, but couldn't identify any way of speeding it up.

Hide

Permalink

Mark Collette added a comment - 19/Jan/10 2:24 PM - edited

With the profiling, I noticed that we spend a lot of time creating StringBuilder objects, so I thought that maybe if I passed one in, and re-used it, that might help. The major allocating, from StringBuilder is with the backing char array, so that couldn't really be helped. What happens is, the String shares the char array, so further modification by the StringBuilder will allocate a new char array, so that allocation will always happen, but I thought that maybe the less times we allocate the StringBuilder itself, the better.

Before the change:

Duration: 9.547 seconds
Duration: 9.531 seconds
Duration: 9.453 seconds

After the change to Parser, ContentParser, and all the classes that invoke them:

Duration: 9.484 seconds
Duration: 9.407 seconds
Duration: 9.406 seconds

As we can see, marginal to non-existent benefit.

Show

Mark Collette added a comment - 19/Jan/10 2:24 PM - edited With the profiling, I noticed that we spend a lot of time creating StringBuilder objects, so I thought that maybe if I passed one in, and re-used it, that might help. The major allocating, from StringBuilder is with the backing char array, so that couldn't really be helped. What happens is, the String shares the char array, so further modification by the StringBuilder will allocate a new char array, so that allocation will always happen, but I thought that maybe the less times we allocate the StringBuilder itself, the better. Before the change: Duration: 9.547 seconds Duration: 9.531 seconds Duration: 9.453 seconds After the change to Parser, ContentParser, and all the classes that invoke them: Duration: 9.484 seconds Duration: 9.407 seconds Duration: 9.406 seconds As we can see, marginal to non-existent benefit.

Hide

Permalink

Mark Collette added a comment - 19/Jan/10 2:28 PM

That's enough for this release.

Show

Mark Collette added a comment - 19/Jan/10 2:28 PM That's enough for this release.

Hide

Permalink

Mark Collette added a comment - 19/Jan/10 4:53 PM - edited

jdk1.5.0_19

Duration: 9.406 seconds
Duration: 9.437 seconds
Duration: 9.375 seconds

jdk1.6.0_14

Duration: 8.343 seconds
Duration: 8.390 seconds
Duration: 8.375 seconds

Show

Mark Collette added a comment - 19/Jan/10 4:53 PM - edited jdk1.5.0_19 Duration: 9.406 seconds Duration: 9.437 seconds Duration: 9.375 seconds jdk1.6.0_14 Duration: 8.343 seconds Duration: 8.390 seconds Duration: 8.375 seconds

People

Assignee:

Mark Collette

Reporter:

Mark Collette

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

13/Jan/10 6:52 PM

Updated:

29/Mar/12 11:56 AM

Resolved:

19/Jan/10 2:28 PM