Loading...
Partitions
Design Decisions
How are partitions represented to solids?
config / environment dict as configured by user
context resource
How do we group runs for the same partition?
tags
How do we group runs across partitions for the same pipeline?
tags
Where do we need to specify the set of possible partitions?
pipeline definition vs schedule definition vs standalone
Should shoot for standalone… should get the schedule definition right, and then work on hacking around the dagit UI
How do we specify execution of a partition?
tags?
presets?
Where do we do partition selection?
explicit selector on schedule definitions
presets
TODO
Figure out how to resolve config/environment dict based on partition
Look at presets?
Figure out execution API for partition
for now, using tags
Figure out if we should allow non-partition jobs for partitioned pipelines, where should the partition selection happen
Requirements
Partition requirements:
Support time-based partitions
(variable-sized)
Support fixed partitions
(ML-style)
Execution requirements:
Support execution of a partition through dagit
Support execution of a partition through a scheduler
Support execution of a batched set of partitions
(backfill)
Run requirements:
Take in config to designate partition
UI requirements:
View job/run status by partition
Approaches
Pipeline partitions
Add partition definition function on a PipelineDefinition
Pros:
Natural for pipeline author to already know how to partition data
Cons:
Potentially overfitting by making Partition too prominent
ScheduleType
Put partition definition function on a ScheduleDefinition
Pros:
Please turn on JavaScript to use Paper in all of its awesomeness. ^_^
Design Decisions
TODO
Requirements
Approaches
Pipeline partitions
ScheduleType