Hey All

Answered

Hey All

Hey all 🙂
I'm having trouble using the clearml-agent command. I am executing an experiment from a code repository and I am using a requirements.txt file to install dependencies.

Here is what happens:
When the task is initialized (using clearml-agent execute --id <TASKID> ) it fails after some time with the Error Message clearml_agent: ERROR: Could not install task requirements! (In my case because of ERROR: No matching distribution found for numpy==1.23.4 ). This is OK, just a dependency problem. But: in the ClearML UI, my task is stuck in the RUNNING status. It seems that my worker is failing to inform the server that the task failed (or failed to install the dependencies).
Any ideas what is going wrong?
Thanks in advance 😉

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StoutElephant16
				
					0
					 × 1

Votes Newest

Answers 7

You can send a REST API call using cURL to the tasks.failed endpoint:
curl -XPUT -u "<key>:<secret>" <server-address>/tasks.failed?task=<task-id>Additional params other than task can be status_message and force

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

See here: https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksfailed

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hey SuccessfulKoala55 . I use my own custom Daemon that in turn runs clearml-agent execute for some complicated reasons (other correlated processes) I want to be able to fetch and execute only certain task id, instead of pulling one from the queue.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StoutElephant16
				
					0
					 × 1

OK. Currently, the clearml-agent execute mode only takes care of task status once it actually starts running it (i.e. environment installation is completed) - it will than set it to started (and eventually completed or failed). I'll see if we can change this behaviour

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Ok thanks a lot for the Info! For now (as a simple error handling): is there any way I can tell the ClearML Server that the experiment should be cancelled using the shell?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StoutElephant16
				
					0
					 × 1

Hi StoutElephant16 , any reason why you're running clearml-agent execute and not using the daemon using clearml-agent daemon to pull the task from a queue and start the execution automatically?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Fantastic thanks a lot 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StoutElephant16
				
					0
					 × 1

Write your answer

2K Views

7 Answers

2 years ago