Anyone Using Trains With Snakemake? I Am Running My Workflow With Snakemake In A Docker Container, And It Can Output To The Trains Server Of Course, But Executing A Task From Trains Ui Tries To Run The Script In Its Own Container... It Downloads An Ubuntu

Answered

Anyone using trains with Snakemake? I am running my workflow with Snakemake in a docker container, and it can output to the trains server of course, but executing a task from trains ui tries to run the script in its own container... It downloads an Ubuntu container. I'm not sure what I really want it to do yet, I'm still exploring what trains is for.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					BroadMole98
				
					0
					 × 1

Votes Newest

Answers 9

BroadMole98 as one can expect long answer as well 🙂

I have a workflow with 19000 job nodes in it.

wow, 19k job nodes? as in a single pipeline 19k steps?

The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)

If the first step is just "logging" all the steps, you can easily add "Task.init" at the beginning of any script and have the ability to upload artifacts, or access other tasks artifacts, create graphs etc.

The trains automl is all about multi-node automl (single node is actually relatively easy to do with optuna and the like), setting up the environment on multiple machines reading back performance metrics in realtime and controlling the flow is the real challenge, which trains covers, as this is all part of the fact everything is logged into the trains-server with full pythonic interface.

regrading invoking jobs, think of the trains-server as a server that holds all the configurations for all the jobs. Then trains-agent execute basically pulls the configuration and sets the environment (whether inside a docker or as virtual environment). On top of it, trains-agent daemon pulls jobs form the execution queue and runs the trains-agent execute to actually launch the job.
Makes sense?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thank you for your help so far, the responsiveness of this community has been a great feature of trains

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					BroadMole98
				
					0
					 × 1

Hmm I think so. It doesn't sound exactly compatible with Snakemake, more kind a replacement, though the pipelining trains does is quite different. Snakemake really is all about the DAG. I just tell it what output I want to get and it figures out what jobs to run to get that, does it massively parallel, and, very importantly, it NEVER repeats work it has already done and has artifacts for already (well, unless you force it to, but that's a conscious choice you have to make). This is super important for big, expensive jobs. Does trains handle that? I haven't seen that in the docs yet.

I read the trains pipeline example but it confused me https://github.com/allegroai/trains/blob/0.15.1/examples/automation/task_piping_example.py and https://allegro.ai/docs/examples/automation/task_piping/ . Is this the analog to the DAG somehow? It looks like it's showing how to enqueue a task, unless I'm misunderstanding it.

What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					BroadMole98
				
					0
					 × 1

I think we'll have to play with it for a while to really solidify what it is we need--we're just starting to implement trains experiment tracking. We can report what happens 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					BroadMole98
				
					0
					 × 1

BroadMole98

I'm still exploring what trains is for.

I guess you can think of Trains as Experiment manager + MLOps tied together.

The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.

In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also done via code in Trains. Lastly Trains comes with a built in agent (trains-agent) and scheduler (including UI) that lets you connect any machine to your cluster (It can also run on top of K8s).

I'm pretty sure we can marry the two of them, but I need more information on the specific use case to come up with a clean solution :)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

There are a few things in my mind... Sorry if this is long. 🙃 I'm just running Snakemake in a docker container on my desktop to isolate dependencies. Inside the container they are run in the normal way with the snakemake command. My snakemake jobs are a variety of python and shell scripts. Snakemake works by using files as intermediaries between jobs. I have a workflow with 19000 job nodes in it.

I have some trains task code right now just in my model training jobs, and that works great, although I am looking at how to amend jobs with further analysis artifacts, since I have a few branching jobs in my workflow that analyze the model outside of the training job.
Snakemake has a kubernetes integration, so I am not sure if that conflicts with trains should I try to clone a trains experiment and run it from the trains UI with some new arguments.
It looks like trains does automl, but my Snakemake job basically assumes any automl is going to originate from within a single snake job. I wonder how that would work with trains if I have a snake job that exposes hyperparameters as input arguments and doesn't do automl internally (this could be desirable), but I have the issue of how does trains properly invoke a snakemake job?
I'm sure I'll come up with more... 😉

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					BroadMole98
				
					0
					 × 1

Hi BroadMole98 ,
what's the current setup you have? And how do you launch jobs to Snakemake?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi BroadMole98

What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?

Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I create the docker / env for each node Data locations (in/out) artifacts is (I think) mostly mounted volumes I tracking the run results of each node in the DAG / compare them Scheduling of the node jobs on a cluster (I mean K8s have that, but priority is missing) Logic based DAGs (process branch A if results is X)
The idea is not to replace but to help with the missing parts. Trains can help you build the dockers (i.e. statically or online), it can track the runtime outputs, it can help with data move . track, it can add scheduling on top of k8s/bare-metal, and with it, you can add login into the DAGs (albeit that actually needs some code).

What do you think is missing for you in your snakemake workflow, maybe we can think together how to build something (I would love to see that combination)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

BroadMole98 Awesome, can't wait for your findings 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

9 Answers

5 years ago

2 years ago