Developers Guide
Code Organization
In this section, we describe the structure of the codebase.
Representations
In saturn/core, we define the following templates & structures. In executors/Technique.py, we provide the abstract skeleton for implementing parallelism techniques. In representations/Strategy.py, we define a data class that allows us to easily aggregate per-model decisions (e.g. how many GPUs, which technique). In representations/Task.py, we provide a data class that lets us collect data on user-submitted models.
Library
In saturn/library/library.py, we define the basic functions for registering and accessing parallelism techniques. These functions are exposed to the user as part of our API. We would welcome new contributions to build a “default” library implementing popular parallelisms (e.g. FSDP, GPipe, etc).
Solver
In saturn/solver/milp.py
, we define the MILP for our joint optimization problem of parallelism selection, resource apportioning, and scheduling.
Alternative solvers (e.g. an RL-based one) must follow the same specification (i.e. input/outputs.)
Trial Runner
In saturn/trial_runner/PerformanceEvaluator.py
, we define the empirical profiler system.
Executor
In saturn/executor/executor.py
, we define the execution engine and introspection scheme.
Tests & Examples
In the examples folder, we provide example training pipelines with Saturn. If you contribute new parallelisms, you will have to register them with the library before use. You can model your tests after our example WikiText job.