$ clearml-agent -d daemon --gpus 1 --foreground DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=239, connect=3, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=239, connect=3, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318dd10>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (2): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=238, connect=2, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=238, connect=2, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318d6d0>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (3): DIFFERENT_IP_ADDRESS:8008
here’s a the file with the keys and IP redacted: https://clearml.slack.com/files/U01PN0S6Y67/F0231N0GZ19/clearml.conf
okay, they are somehow set as environment variables. let me figure out how they were set.
i’ll try changing the IP and look for a different error.
yes, i can do this again. i did use clearml-agent init
to generate clearml.conf
after generating a fresh set of keys
Can you just try clearml-agent config
?
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
Yeah... I see it's quoted there, so it shouldn't be a problem...
of course, SERVER_IP_ADDRESS
is the actual IP address of the server, AND i made sure that CLEARML_HOST_IP
was set correctly before issuing the docker-compose
command
hmm, it was confusing to me, but it’s kind of an edge case where I was taking over a computer after a colleague left, seems like that might not be a common scenario
well, as generated by clearml-agent init
—i pasted the text directly from the web app into the CLI interface, and it generated clearml.conf
Can you share the clearml.conf
? Maybe something will pop ?
NastyFox63 try using the credentials in the curl
request, just to make sure you can authenticate them with the server... do:curl -u <key>:<secret>
except for the IP address and the actual keys, it’s the vanilla config generated by clearml-agent init
also, i’m noticing the “last used” field does not update when I try to start an agent, but does change when I issue the curl
command you gave earlier
I think we should just add some kind of a warning in these cases
that seems like a good solution 🙂
thank you SuccessfulKoala55 and AgitatedDove14 for your help! Martin identified the problem early on, but I only checked my .bashrc
, 😞
` $ clearml-agent config
Current configuration (clearml_agent v1.0.0, location: /home/username/clearml.conf):
agent.worker_id =
agent.worker_name = computer
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = ~/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = ~/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = ~/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = ~/.clearml/pip-cache
agent.docker_apt_cache = ~/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.git_user =
agent.default_python = 3.7
agent.cuda_version = 112
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = 7L******
api.host = `
OK, so we know these credentials are good... how exactly do they appear in the config file?
should api.credentials.access_key
be the same as the access_key
in clearml.conf
?
Env vars always win over a config file, but explicit CLI params trump env vars as well. In this case it's a close call - maybe we'll need to change that?
yes, that call appeared to be successful—had to wrap in quotes because of the contents of the key:$ curl -u 'J9*****':'R2*****'
{"meta":{"id":"6db9ae72249f417fa2b6b8705b44f38a","trx":"6db9ae72249f417fa2b6b8705b44f38a","endpoint":{"name":"users.get_current_user","requested_version":"2.13","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{}},"data":{"user":{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b","name":"trains"},"family_name":"andrew","given_name":"andrew","id":"214ae8a2b7b04abe802d35b8d1c39c0c","name":"andrew","role":"user"}}}
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
looks like a previous user set CLEARML_API_ACCESS_KEY
and CLEARML_API_SECRET_KEY
in /etc/environment
and then disabled the keys in the web app. I removed the two items from /etc/environment
and was able to successfully start a worker.
it seems, though, that the env vars take precedence even when a --config-file
is explicitly specified?
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
should
api.credentials.access_key
be the same as the
access_key
in
clearml.conf
?
Yes, it should. Isn't it?
looking in the web app, under the “App Credentials” section, it lists those credentials as “used” when I attempted the curl
command.