Does Anyone Have Experience With Integrating Clearml And Slurm? If So, What Pattern Did You Use? (Did You Submit Tasks And Just Use Clearml As Tracker, Or Did You Start Agents With Slurm?) Would Love To Hear From The Community Before Trying To Diy

Answered

does anyone have experience with integrating clearml and slurm? if so, what pattern did you use? (did you submit tasks and just use clearml as tracker, or did you start agents with slurm?)

would love to hear from the community before trying to DIY

  				
Posted 
	2 months ago

					More  		
  Report
		
					SoreSparrow36
				
					0
					 × 1

Votes Newest

Answers 10

Hi SoreSparrow36
Of course fully integrated, here's a link to the docs https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_deployment/#slurm
The main advantage is the ability to launch and control jobs from outside the slurm cluster, from simple pipeline to logging the console outputs the performance and the ability to abort jobs directly from clearml as well as storing outputs
Wdyt?

  				
Posted 
	one month ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

im helping train my friend

on clearml to assist with his astrophysics research,

if that's the case, what you can do is use the agent inside your sbatch script,
(full open source). This means the sbatch becomes " clearml-agent execute --id <task_id_here> " this will set up the environment and monitor the job and still allow you to launch it from slurm, wdyt?

  				
Posted 
	one month ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ah . that's a shame its under Enterprise only . no wonder I missed it .

im helping train my friend FlutteringSeahorse49 on clearml to assist with his astrophysics research, and his university has a slurm cluster . So we're trying to figure out if we can launch an agent process on the cluster to pull work from the clearml queue (fwiw: on their cluster containers is not supported ) .

  				
Posted 
	one month ago

					More  		
  Report
		
					SoreSparrow36
				
					0
					 × 1

FlutteringSeahorse49 wants to start HPO though, so the desire is to deploy agents to listen to queues on the slurm cluster (perhaps the controller runs on his laptop).

would that still make sense?

  				
Posted 
	one month ago

					More  		
  Report
		
					SoreSparrow36
				
					0
					 × 1

Sorry SmallTurkey79 just notice your reply
Hmm so I know the enterprise version has a built-in support for slurm, which would remove the need to deploy agents on the slurm cluster.
What you can do is on the SLURM login server (i.e. a machine that can run sbatch), write a simple script that pulls the Task ID from the queue and calls sbatch with clearml-agent execute --id <task_id_here> , would this be agood solution

  				
Posted 
	one month ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

but isnt that just the same as running agent in daemon mode? thats what i was hoping James could do

  				
Posted 
	one month ago

					More  		
  Report
		
					SoreSparrow36
				
					0
					 × 1

i think he's saying you'd want an intermediary layer that acts like the daemon .
why not run the daemon directly im not sure, but i suspect its bc it doesn't have an "end time" for execution (stays up)

  				
Posted 
	one month ago

					More  		
  Report
		
					SoreSparrow36
				
					0
					 × 1

Would this be equivalent to an automated job submission from clearml to the cluster?

yes exactly

I am looking for a setup which allows me to essentially create the workers and start the tasks from a slurm script

hmm I see, basically the slurm Admins are afraid you will create a script the clogs the SLURM cluster, hence no automated job submission, so you want to use slurm as a "time on cluster" and then when your time is allocated, use clearml for the job submission, is that correct?
If so then do exactly as SmallTurkey79 suggested, run the clearml daemon as a slurm batch, basically the daemon can run your jobs automatically, but from a slurm perspective you are still limited to the time slot that is allocated for you. also notice you can spin multiple clearml-agent daemon, so that you can run multiple jobs on the same node.

  				
Posted 
	one month ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Would this be equivalent to an automated job submission from clearml to the cluster? My cluster security rules do not allow for automated job submission. I am looking for a setup which allows me to essentially create the workers and start the tasks from a slurm script, with clearml simply receiving the information about the workers and sending information to the cluster regarding allotment of the tasks, but without clearml explicitly sending the work to the cluster. Let me know if this makes sense - or maybe I am misunderstanding what you're saying above

  				
Posted 
	one month ago

					More  		
  Report
		
					FlutteringSeahorse49
				
					0
					 × 1

The difference is that running the agent in daemon mode, means the "daemon" itself is a job in SLURM.
What I was saying is pulling jobs from the clearml queue and then pushing them as individual SLURM jobs, does that make sense ?

  				
Posted 
	one month ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

219 Views

10 Answers

2 months ago

one month ago