Kevin Wheatley: This week Is mainly an update from me on what I've done to optimize OCIO CPU performance in my fork. It's not definitely a final set of commits. I was initially rearranging things to put anything that changed output values into later commits, but gave up on that to prioritize progress. I'm still working on it and have more I'll push tomorrow. I haven't yet worked on the table lookups and gamut mapper performance. I did some investigation into whether there were possible lookup optimizations that could be done without changing the hue distribution.
[Kevin showed a graph of the actual hue distribution and the error from assuming the distribution was uniform]
Kevin Wheatley: It's always off in the same direction, so we could narrow the search window without changing the distribution. But we would need to check if that assumption holds for a range of gamuts. The alternative is to change the distribution to be linear, as I did, which removes iteration from the search. We need to look at the performance difference vs rendering difference. If we decide to redistribute the hues we should do it for all the lookups and combine them.
Scott Dyer: If we do change the pixels, it does make sense to make all those changes in one go. What effect on picture and speed do the changes you've already made have?
Kevin Wheatley: I tried combining the RGB to LMS matrix with the white balancing weightings, and was surprised to see it made a difference to the result. I need to look into that. Also I notice that in the JMh conversion there are equations in one direction, and those equations are expressed as a matrix in the inverse direction. Making both a matrix seems to change things. I need to have a way of comparing the effects of those against color shifts from other optimizations. I've also looked at eliminating some scaling factors in the J <> Y in the tone scale. Every little helps.
[Kevin showed the output of a performance analysis tool]
Kevin Wheatley: Why things are slow is not always obvious from the C++. Looking at the compiled assembly helps. The analysis confirms what Rémi saw, that pow is called a lot of times, and each time is relatively costly, so it is a significant bottleneck. I looked into alternate pow functions. We know a bit about the values going in. They are not negative and there are no infinities. The exponent or base is known. The C implementations are optimized for the general case, not necessarily our specific one. I've done some rearrangements to reduce the number of pow calls. Nick had noted on the OCIO Slack that if you grab achromatic A before going to J you can use that in the J_to_Y and remove a pow, and that would also apply to the GPU. When I really look at the GPU code, I'll need help from Rémi or somebody else from OCIO. My CPU improvements have gone from 9000ms to 6000ms. I've tried to avoid repeated lookups by looking up once and passing values down. We can simplify when we don't need the back and forth to reference state that was needed in V60 which still had a lot of options. The CTL may not suffer from all the slow downs because it has a static initialization stage, and memorizes the results. In C++ you have to hand build that. I'm continuing the optimizations that Rémi started.
Scott Dyer: I can see from your commits generally what you're doing. But I can't really help.
Kevin Wheatley: If a company or person could investigate e.g. an optimized pow(0.42) or compacting the J <> Y and tone scale, that would help. There are hacky ways of doing pow functions with the binary representations of floats directly. My laptop is old, so it's a worst case scenario. I'll try newer faster CPUs on Linux next week.
Rémi Achard: Kevin's stuff looks really good. In my previous tests I got 3x slower than ACES 1 on the CPU. Before I was testing on an old LibC, and recent updates make the math library much faster. Eric found about 2x slower than ACES 1 on the GPU. This is ACES main branch and can be improved.
Scott Dyer: We'll have these meetings at the same time each Wednesday to meet OCIO's end of February deadline.
Nick Shaw: I responded on the OCIO Slack to Gary Demos's comments.
Kevin Wheatley: I looked quickly at that. A lot of his points we had discussed before. It would have been good if we'd had input six months ago.
Nick Shaw: Things to consider for ACES 3.0!
Scott Dyer: Next week hopefully we'll have Doug and Carol. It will be my last week for a bit, but you can do what's needed to get the next OCIO release. I'll work that back into the CTL later.
ACES Output Transforms VWG
Meeting #175, January 8th 2025, 1pm PT
[Meeting Recording]
Attendees
Meeting Notes
Meeting #174, December 11th, 1pm PT
[Meeting Recording]
Attendees