Hello, I Am New To Clearml, I Would Like To Learn More About How Clearml Works On A Hpc Cluster Where The Only Way To Get Computational Resources Is Via Slurm:

Answered

Hello, I am new to ClearML, I would like to learn more about how ClearML works on a HPC cluster where the only way to get computational resources is via SLURM:
anyone have some experience? Is the feature supported by ClearML or on the roadmap? I read https://clear.ml/docs/latest/docs/clearml_agent and from my understanding agent do the job of SLURM, does that mean if SLURM is mandatory, I can not use full feature of agent? Any guide of how to integrate both of them?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnsightlyLion90
				
					0
					 × 1

Votes Newest

Answers 8

Correct 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I see. Thanks. 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnsightlyLion90
				
					0
					 × 1

I see, in that way I do not use clearml’s queue, instead ask clearml to run the code immediately the slurm job begin. Correct?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnsightlyLion90
				
					0
					 × 1

Yes, for now I have a bash script like make a snapshot of the source code and all the config file at the time of I submit a slurm job, and when sometime later the job run, use that snapshot. and I hope clearml can do it for me.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnsightlyLion90
				
					0
					 × 1

That should work 🙂
BTW, you might play around with "clearml-agent execute --id <task_id_here>"
This will basically clone the code, create a venv with the python packages, apply uncommitted changes and will run the actual code. This could be a replacement for your bash. (notice it means that you need to clone the Task in the UI, then you can Change parameters, then the run the agent manually in SLURM and it will take the params from the UI.)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I think so, when you are saying "clearml (bash script..." you basically mean, "put my code + packages + and run it" , correct ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi UnsightlyLion90

from my understanding agent do the job of SLURM,

That is kind of correct (they overlap in some ways 🙂 )

Any guide of how to integrate both of them?

The easiest way is to just add the "Task.init()" call to your code, and use SLURM to schedule the job. this will make sure all jobs are fully logged (this can also includes automatically uploading the models, and artifact support etc)
Full SLURM support (i.e. similar to the k8s glue support), is currently out of scope, but I'm pretty sure the enterprise version includes support for it.
wdyt?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi Martin, Thank you for your reply! so although I have not dig into the docs, I can imagine three ways to run the jobs:
slurm (get accress to compute node) -> clearml (bash script, for make a copy of source code and build virtual environment) -> python script. clearml (bash script) -> slurm -> python scripts slurm -> python script with clearml APIMay I say you suggested the third way? If so, would I get the benefit of clearml to take care of my project (log the git commit and copy of source code)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnsightlyLion90
				
					0
					 × 1

Write your answer

1K Views

8 Answers

4 years ago

2 years ago