Compile time issues

Problem Specification

  •  O(100) stencils 
  • Most of them are compiling in the range of 10-20 seconds
  • As we need to run on 6 nodes for our example all of this needs to happen on compute nodes (which means slightly slower file system access)
  • big stencils (O(100) stages, O(100) fields) gives us longer compile times on CPU (up to 8 minutes) and GPU (up to hours)

Anton’s Finding:

  • Compile time should scale with  O(fields_no * stencils_no)
  • some examples might be worse as they consume a lot of memory
  • (fusing stencils should not affect times under this assumption, but our empirical results show different behaviour) - is this only memory based?
  • Anton found a way to reduce it by 2-3%, this does not solve the big problem on our side though

Current Solution for CI:

  • create a .gt_cache nightly (around 2h)
  • CI copies the cache over (if the gt4py versions match) and therefore only compiles the changed ones
  • this give us response times in o(10) minutes [with daint queue]