Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I create the docker / env for each node Data locations (in/out) artifacts is (I think) mostly mounted volumes I tracking the run results of each node in the DAG / compare them Scheduling of the node jobs on a cluster (I mean K8s have that, but priority is missing) Logic based DAGs (process branch A if results is X)
The idea is not to replace but to help with the missing parts. Trains can help you build the dockers (i.e. statically or online), it can track the runtime outputs, it can help with data move . track, it can add scheduling on top of k8s/bare-metal, and with it, you can add login into the DAGs (albeit that actually needs some code).
What do you think is missing for you in your snakemake workflow, maybe we can think together how to build something (I would love to see that combination)