Remote run clones data repo (Mantik API + Compute Backend)
Summary
As a machine learning expert, I want mantik to fetch specific versioned data for me during a training run so I can reproduce runs without manually loading the data myself.
Acceptance Criteria
-
remote run clones the relevant data
Given I have a training script and data (in git)
And my training script contains a command to download the training data
When I execute the on HPC
Then the data (with version) gets cloned
And is available to the training script
Testing
-
acceptance criterion -
mantik api passes relevant information to compute backend
Given I have a data repository stored with Mantik
When I trigger a remote run through Mantik API
And the run contains data details (id, project id, version)
Then that information gets passed to the compute backend.
-
compute backend passes relevant variables to HPC
Given I triggered a remote run through Mantik API
And the run contains data information
When the run information / configuration is passed on to HPC
Then the relevant data-related variables are passed to HPC
And the training script has access to them.
Additional Notes / Information
see Additional Notes of previous ticket
Technical Information
Suggested Implementation
-
extend mantik api trigger run function to pass (data id, project id, data version, ...) to compute backend -
extend compute backend to export the passed variables into the environment where the training script will be executed. /cc @rico.berner
Edited by Omar Ahmed