I'M Trying To Set Up Some Initial Experiments Within Our Stack, But When I Use The

Answered

I'm trying to set up some initial experiments within our stack, but when I use the execute_remotely task, I get this error:
clearml_agent: ERROR: Failed getting token (error 401 from http://<server ip>:8008): Unauthorized (invalid credentials) (failed to locate provided credentials)Docker logs reveal:
clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server ?(along with some pip warnings)

Sometimes the terminal also shows:
2021-07-05 13:22:34,565 [WARNING] [urllib3.connectionpool]: Retrying (Retry(total=238, connect=238, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPCo nnection object at 0x7f6006687a50>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 30

Nope, no other config files

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely task?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

What about setting the

working_directory

to the user working directory using

Task.init

or

Task.create

?

The working_directory is simply one of the parameters used when cloning a git repository, so it won't work...
You can rely on a fixed mount point, for example, but that requires more setup

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

From the log you shared, the task is picked up by the

worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1

worker

I can try and target the default one if it helps..?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

So the ..data referenced in the example above are part of the git repository?
What about setting the working_directory to the user working directory using Task.init or Task.create ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

That will come at a later stage

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Not likely

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It failed on some missing files in my remote_execution, but otherwise seems fine now

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Strange...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It should dump a log to stdout

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Thanks for your help SuccessfulKoala55 ! Appreciate the patience 🙏

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I'll kill the agent and try again but with the detached mode 🤔

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

seems OK

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Or store as a configuration item (if it's not a lots of data)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

You can always upload using the StorageManager and download if the file is not there

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

This follows the standard ClearML remote execution practice - an agent runs the task, and either uses the actual python code file (stored entirely in the server under the uncommitted changes section), or clones a git repository

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Okay trying again without detached

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Is there a preferred way to stop the agent?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the

execute_remotely

task?

For that you'll need to use a Git repository - the repository will be automatically cloned when running the task remotely.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Seemed to work fine again in detached mode, what went wrong there :shocked_face_with_exploding_head:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I just used this to create the dual_gpu queue:
clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Is there a preferred way to stop the agent?

Same agent command + --stop

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Don't do it in detached mode - do it in another console window

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hm, that seems less than ideal. I was hoping I could pass some CSV locations. I'll try and find a workaround for that. Thanks!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Stop and re-run the agent

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hah. Now it worked.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

So the

..data

referenced in the example above are part of the git repository?

Yup 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I was thinking of using the --volume settings in clearml.conf to mount the relevant directories for each user (so it's somewhat customizable). Would that work?

It would be amazing if one can specify specific local dependencies for remote execution, and those would be uploaded to the file server and downloaded before the code starts executing

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Write your answer

2K Views

30 Answers

4 years ago

2 years ago