Running Into A Strange Issue—

Answered

running into a strange issue— clearml-agent is unable to authenticate with my clearml server.

i’ve got a clearml server running on a local machine, deployed using docker compose—latest version. I have:
generated access keys using the web app used clearml-agent init to generate clearml.conf , pasting the key info from the web app tried to create an agent running on a different machine.
relevant data:

(clearml) user@computer:~$ clearml-agent --version CLEARML-AGENT version 1.0.0 (clearml) user@computer:~$ clearml-agent -d daemon --gpus 1 --foreground DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): SERVER_IP_ADDRESS:8008 DEBUG:urllib3.connectionpool: "GET /auth.login HTTP/1.1" 401 347 Traceback (most recent call last): File "/home/user/miniconda3/envs/clearml/bin/clearml-agent", line 8, in <module> sys.exit(main()) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/__main__.py", line 81, in main return run_command(parser, args, command_name) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/__main__.py", line 42, in run_command command = command_class(**vars(args)) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/helper/base.py", line 238, in __call__ cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/commands/worker.py", line 419, in __init__ super(Worker, self).__init__(*args, **kwargs) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/commands/base.py", line 98, in __init__ self._session = self._get_session(*args, **kwargs) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/commands/base.py", line 113, in _get_session return Session(*args, **kwargs) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/session.py", line 94, in __init__ super(Session, self).__init__(*args, **kwargs) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/backend_api/session/session.py", line 152, in __init__ self.refresh_token() File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/backend_api/session/token_manager.py", line 104, in refresh_token self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec)) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/backend_api/session/session.py", line 549, in _do_refresh_token six.reraise(*sys.exc_info()) File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/user/miniconda3/envs/clearml/lib/python3.7/site-packages/clearml_agent/backend_api/session/session.py", line 542, in _do_refresh_token res.status_code, self.host, msg clearml_agent.backend_api.session.session.LoginError: Failed getting token (error 401 from ): Unauthorized (invalid credentials) (failed to locate provided credentials)
I searched this slack for this error, and i did try setting the --config-file option explicitly, with the same result. stuck on what the issue might be.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Votes Newest

Answers 30

okay, they are somehow set as environment variables. let me figure out how they were set.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

i did this and have the same error.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

also, i’m noticing the “last used” field does not update when I try to start an agent, but does change when I issue the curl command you gave earlier

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

` $ clearml-agent config
Current configuration (clearml_agent v1.0.0, location: /home/username/clearml.conf):

agent.worker_id =
agent.worker_name = computer
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = ~/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = ~/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = ~/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = ~/.clearml/pip-cache
agent.docker_apt_cache = ~/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.git_user =
agent.default_python = 3.7
agent.cuda_version = 112
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = 7L******
api.host = `

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I think we should just add some kind of a warning in these cases

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

no, it’s a key I don’t recognize

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

that seems like a good solution 🙂

thank you SuccessfulKoala55 and AgitatedDove14 for your help! Martin identified the problem early on, but I only checked my .bashrc , 😞

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

looks like a previous user set CLEARML_API_ACCESS_KEY and CLEARML_API_SECRET_KEY in /etc/environment and then disabled the keys in the web app. I removed the two items from /etc/environment and was able to successfully start a worker.

it seems, though, that the env vars take precedence even when a --config-file is explicitly specified?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

except for the IP address and the actual keys, it’s the vanilla config generated by clearml-agent init

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

looking in the web app, under the “App Credentials” section, it lists those credentials as “used” when I attempted the curl command.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Env vars always win over a config file, but explicit CLI params trump env vars as well. In this case it's a close call - maybe we'll need to change that?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

here’s a the file with the keys and IP redacted: https://clearml.slack.com/files/U01PN0S6Y67/F0231N0GZ19/clearml.conf

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Can you share the clearml.conf ? Maybe something will pop ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

OK, so we know these credentials are good... how exactly do they appear in the config file?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yes, that call appeared to be successful—had to wrap in quotes because of the contents of the key:
$ curl -u 'J9*****':'R2*****' {"meta":{"id":"6db9ae72249f417fa2b6b8705b44f38a","trx":"6db9ae72249f417fa2b6b8705b44f38a","endpoint":{"name":"users.get_current_user","requested_version":"2.13","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{}},"data":{"user":{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b","name":"trains"},"family_name":"andrew","given_name":"andrew","id":"214ae8a2b7b04abe802d35b8d1c39c0c","name":"andrew","role":"user"}}}

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Yeah... I see it's quoted there, so it shouldn't be a problem...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

i’ll try changing the IP and look for a different error.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

should

api.credentials.access_key

be the same as the

access_key

in

clearml.conf

?

Yes, it should. Isn't it?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yes, i can do this again. i did use clearml-agent init to generate clearml.conf after generating a fresh set of keys

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

NastyFox63 try using the credentials in the curl request, just to make sure you can authenticate them with the server... do:
curl -u <key>:<secret>

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

hmm, it was confusing to me, but it’s kind of an edge case where I was taking over a computer after a colleague left, seems like that might not be a common scenario

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

well, as generated by clearml-agent init —i pasted the text directly from the web app into the CLI interface, and it generated clearml.conf

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

after generating a fresh set of keys

when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yeah, the server notes that

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

$ clearml-agent -d daemon --gpus 1 --foreground DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=239, connect=3, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=239, connect=3, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318dd10>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (2): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=238, connect=2, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=238, connect=2, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318d6d0>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (3): DIFFERENT_IP_ADDRESS:8008

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

should api.credentials.access_key be the same as the access_key in clearml.conf ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

of course, SERVER_IP_ADDRESS is the actual IP address of the server, AND i made sure that CLEARML_HOST_IP was set correctly before issuing the docker-compose command

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					NastyFox63
				
					0
					 × 1

Can you just try clearml-agent config ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

2K Views

30 Answers

4 years ago

2 years ago

Answers 30

` $ clearml-agent configCurrent configuration (clearml_agent v1.0.0, location: /home/username/clearml.conf):

` $ clearml-agent config
Current configuration (clearml_agent v1.0.0, location: /home/username/clearml.conf):