I think we should just add some kind of a warning in these cases
Can you share the clearml.conf
? Maybe something will pop ?
Env vars always win over a config file, but explicit CLI params trump env vars as well. In this case it's a close call - maybe we'll need to change that?
should
api.credentials.access_key
be the same as the
access_key
in
clearml.conf
?
Yes, it should. Isn't it?
of course, SERVER_IP_ADDRESS
is the actual IP address of the server, AND i made sure that CLEARML_HOST_IP
was set correctly before issuing the docker-compose
command
here’s a the file with the keys and IP redacted: https://clearml.slack.com/files/U01PN0S6Y67/F0231N0GZ19/clearml.conf
OK, so we know these credentials are good... how exactly do they appear in the config file?
okay, they are somehow set as environment variables. let me figure out how they were set.
NastyFox63 try using the credentials in the curl
request, just to make sure you can authenticate them with the server... do:curl -u <key>:<secret>
well, as generated by clearml-agent init
—i pasted the text directly from the web app into the CLI interface, and it generated clearml.conf
hmm, it was confusing to me, but it’s kind of an edge case where I was taking over a computer after a colleague left, seems like that might not be a common scenario
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
i’ll try changing the IP and look for a different error.
yes, that call appeared to be successful—had to wrap in quotes because of the contents of the key:$ curl -u 'J9*****':'R2*****'
{"meta":{"id":"6db9ae72249f417fa2b6b8705b44f38a","trx":"6db9ae72249f417fa2b6b8705b44f38a","endpoint":{"name":"users.get_current_user","requested_version":"2.13","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{}},"data":{"user":{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b","name":"trains"},"family_name":"andrew","given_name":"andrew","id":"214ae8a2b7b04abe802d35b8d1c39c0c","name":"andrew","role":"user"}}}
$ clearml-agent -d daemon --gpus 1 --foreground DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=239, connect=3, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=239, connect=3, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318dd10>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (2): DIFFERENT_IP_ADDRESS:8008 DEBUG:urllib3.util.retry:Incremented Retry for (url='/auth.login'): Retry(total=238, connect=2, read=240, redirect=240, status=240) WARNING:urllib3.connectionpool:Retrying (Retry(total=238, connect=2, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49318d6d0>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login DEBUG:urllib3.connectionpool:Starting new HTTP connection (3): DIFFERENT_IP_ADDRESS:8008
yes, i can do this again. i did use clearml-agent init
to generate clearml.conf
after generating a fresh set of keys
Yeah... I see it's quoted there, so it shouldn't be a problem...
looks like a previous user set CLEARML_API_ACCESS_KEY
and CLEARML_API_SECRET_KEY
in /etc/environment
and then disabled the keys in the web app. I removed the two items from /etc/environment
and was able to successfully start a worker.
it seems, though, that the env vars take precedence even when a --config-file
is explicitly specified?
except for the IP address and the actual keys, it’s the vanilla config generated by clearml-agent init
Can you just try clearml-agent config
?
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
looking in the web app, under the “App Credentials” section, it lists those credentials as “used” when I attempted the curl
command.
should api.credentials.access_key
be the same as the access_key
in clearml.conf
?
` $ clearml-agent config
Current configuration (clearml_agent v1.0.0, location: /home/username/clearml.conf):
agent.worker_id =
agent.worker_name = computer
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = ~/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = ~/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = ~/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = ~/.clearml/pip-cache
agent.docker_apt_cache = ~/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.git_user =
agent.default_python = 3.7
agent.cuda_version = 112
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = 7L******
api.host = `
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
that seems like a good solution 🙂
thank you SuccessfulKoala55 and AgitatedDove14 for your help! Martin identified the problem early on, but I only checked my .bashrc
, 😞
also, i’m noticing the “last used” field does not update when I try to start an agent, but does change when I issue the curl
command you gave earlier