An idempotent operation produces the same result no matter how many times you execute it. Idempotent operations can be retried after failure without worrying about side-effects, meaning they are important in building reliable, self-healing systems.
If a ML pipeline is idempotent, you can re-run it in the case of failure without having to worry about side-effects - you will always get the same output.
An inference pipeline takes features and a model as input and produces prediction(s) as output. The predictions are either returned to the client (online inference) or stored in some stable storage (batch inference).
An idempotent batch inference pipeline should read the same input features from the feature store and produce the same output. For example,a non-idempotent batch inference pipeline could take as a parameter for the date range of input features as “new features that arrived today”. An idempotent version of the same pipeline would require the pipeline executor to supply the date range as a parameter. This way, an orchestration engine, such as Airflow, can run the idempotent pipeline on a schedule (e.g., daily), and for each run, it provides the range of data to perform inference on (e.g., 24th April 2023).
Online inference pipelines are more difficult to design as idempotent if they retrieve precomputed features from the feature store, as the precomputed features can change between different online inference requests.
A training pipeline that takes a training dataset as input and produces a model as output can be considered as idempotent. However, unless extreme care is taken to ensure that the model training was deterministic (use the same seeds when using random number generation), the resultant models may be different. For many ML algorithms, it is practically impossible to reproduce an exact copy of a model due to randomness in the algorithms (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPUs).
For a feature pipeline, an idempotent write to an offline feature store typically involves dropping the feature group and recreating it. This is often not desirable due to both the overhead involved and the loss of feature data history. For an online feature store that only stores the latest feature values, a feature pipeline (that correctly orders writes from clients) can safely make idempotent updates to feature values (as is done by Hopsworks, see figure below). To enable reliable incremental updates of offline feature stores, they typically support ACID operations that ensure data integrity and enable operation retries when failures occur.
In the image above, we can see how Hopsworks receives writes through Kafka that can introduce duplicates, but consistent offline/online features are enabled through idempotent updates to the online feature store and ACID updates that discard duplicates (through Apache Hudi support) to the offline store.