Meeting Summaries 

ACES Output Transforms VWG 

← Use sidebar to navigate to a particular meeting.
Use Cmd+F or Ctrl+F to search this page.

Meeting #176, January 15th, 1pm PT

Attendees

Alex Fry
Kevin Wheatley
Scott Dyer
Nick Shaw

Rémi Achard
Carol Payne
Pekka Riikonen
Doug Walker

Others TBC

Meeting Notes

  • Kevin Wheatley: I've been working on my optimizations with input from Rémi, Nick and Pekka. Pekka's suggestion I've just added as a comment in my code. He pointed out that we use a power of 1/0.55, and if we changed that to 1/0.5 it would be a simple square, which is more efficient. It would change the look slightly. Nick and I were discussing another optimization.
[Nick showed a Desmos plot]
  • Nick Shaw: As a reminder, the idea behind the invertible gamut compression is to base the compression slope not on the source JM values, but only on the intersection of the compression vector with the J axis, and solve for the intersection producing a line which passes through source (J, M). This means any point on the line will solve to the same intersection, so produce the same slope. So compressing along the line you can find the same line to invert back along. That got changed slightly at the top to smooth the path to white for bright saturated colors, so the slope is altered depending on the source J value. The way the Blink code worked, with all the options, in the solve for the reach boundary the source JM was not available. Rather than pass that value along I simply re-ran the J intersection solve using the gamut boundary intersection, which was available. Because I assumed that was on the same line it should produce the same J intersection and thus slope. But this assumption is broken above the threshold where the focus gain comes into play. There solving from source JM and boundary JM don't give quite the same intersection and slope, because their J values are different. In Kevin's code everything is inlined so the original intersection and slope are available. The solve_J_intersect function running once instead of twice will improve speed, but slightly change the result for some pixels. Using a different solve was an accident, but I don't know if it is needed to contribute to the smoothing of the path to white there.
  • Pekka Riikonen: I don't think it matters, because for the inverse we have to use an approximation above the threshold. But in the forward direction it will affect the look.
  • Nick Shaw: Because it only happens right at the top where the slope is pretty horizontal already, hopefully the effect is small because the J values are similar. Kevin found the largest effect on extreme magentas.
  • Kevin Wheatley: It does help the performance measurably.
  • Nick Shaw: I believe it is also the correct thing to do.
  • Kevin Wheatley: Yes because otherwise the ratios aren't quite what you assume they are, and it all adds up. I propose we include it, because other changes will have a larger effect. I can't make the binary searching any more efficient without redistributing the hues.
  • Nick Shaw: Because some parameters were fine tuned to only just make the round trip, we may have to tune them again after changes.
  • Kevin Wheatley: We should re-tune after all optimizations.
  • Pekka Riikonen: Do you have this in a Blink version?
  • Kevin Wheatley: Not this exact thing, but the changes I made were pretty much the same as what I did in my previous Blink version. The next thing for me to do is redistribute the hues. I've already made sub-functions where parts of different equations were actually doing the same thing. The gamut compress is still a large part of the time taken, but it's now broken down so we can see what parts contribute most. The cusp lookups are significant, and if I do the hue lookup separately, then the other lookups become more efficient because I already know what interval to lerp. Chroma compression is still heavy but I don't yet know why. I need to test other architectures. Matrix ops could be SSE optimized. There are some branches, but a lot of code could use SIMD ops. We may also be able to remove some checks for e.g. division by zero, if we know that can't happen. My current code is CPU, but I'll also have a look at the GPU shaders next week.
  • Nick Shaw: Some traps might have been needed when we had many options and adjustable parameters. But not all may be needed with the chosen options, and clamp to AP1.
  • Kevin Wheatley: Currently the AP1 clamp is external to the fixed functions, so if you use them in isolation, perhaps for tests, you could feed in data they were not designed to handle. That's an OCIO question.
  • Doug Walker: The clamp is so small it may be worth adding to the fixed functions.
  • Kevin Wheatley: I also wanted to remove the radians to degrees conversions. Degrees are useful for humans, but in code radians are simpler. These is currently an implicit assumption of degrees somewhere, which I haven't found yet. Changing the table size from 360 may help find that. We probably need some tests in OCIO of the individual components.
  • Doug Walker: Maybe we could help with that.
  • Kevin Wheatley: Testing forward and inverse of individual components rather than the whole transform would help. The current tests sometimes still passed when I introduced an error producing bad looking images. I've added more TODOs in the code but most are minor. I also tested my own power function using log and exp, and it was faster than powf. Don't know why. It may be architecture specific.
  • Rémi Achard: Before looking at that we should vectorize the code.
  • Doug Walker: That would be CPU specific. We should prioritize things that would help CPU and GPU.
  • Kevin Wheatley: So far I have a ~30% speed up on the CPU.
  • Doug Walker: What can we help with?
  • Kevin Wheatley: I could do with help with GPU code. And adding finer grained testing. I was testing with the gamut cube, and I suspect some of Rémi's test values aren't on the edge of the gamut where issues might show up. It would be good to plot the speed ups for each of my commits on different machines, and check the trend was always in the right direction.
  • Doug Walker: We'll create some tests and send them to you to look at. We can also help you port your CPU commits to the GPU.
  • Kevin Wheatley: Although having the GPU run the previous version is a useful comparison.
  • Doug Walker: Eric posted on the Slack about the GPU profiling results he got. We also want to try profiling Metal using Xcode.
  • Rémi Achard: Something odd happened when I tried that. I got exactly the same result every time when I expected variation.
  • Doug Walker: When we've ported Kevin's finished optimizations to the GPU we can engage with Eric again. And also check he's testing what we think he's testing.

Meeting #175, January 8th 2025, 1pm PT