Reputation
Badges 1
25 × Eureka!Thanks Shay, good to know i just hadn't configured something correctly!
Ok great! Iāve actually provided a key and secret so I guess it should be working. Would you have any suggestions about where I could look to debug? Maybe the docker logs of the web server?
Thanks AgitatedDove14 ! Just to make sure Iām understanding correctly, do you mean that the ClearML Web server in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server issues a delete command to the ClearML API server, which is then responsible for trying to delete the files in S3? And that I need to enter an AWS key/secret in the profile page of the web app here?
Hi UnevenDolphin73 sorry for the slow reply, been on leave!
We donāt have a solution right now, but if thereās no fix to the frontend in the near future weāll probably try to write a script that queries the ClearML API for all artefacts, queries S3 for all artefacts, and figures out orphaned artefacts to delete.
Oh wow thanks SuccessfulKoala55 , so sorry I didnāt think to check the agent docs! š
EDIT: Turns out in that AMI, the dockerfile has:agent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
So I changed# CLEARML_API_HOST:
`
CLEARML_API_HOST: ${CLEARML_A...
Oh thatās cool, I assumed the DevOps project was just examples!
Thereās a jupyter_url
property there that is http://{instance's_private_ip_address}:8888?token={jupyter_token}
Thereās alsoexternal_address {instance_public_ip_address} internal_ssh_port 10022 internal_stable_ssh_port 10023 jupyter_port 8888 jupyter_token {jupyter_toke} vscode_port 9000
Maybe this is something stupid to do with VPCs that I should understand bet...
Update: I see that by default it uses 10022 as the remote SSH port, so Iāve opened that as well (still getting the ātunneling failedā message though).
Iāve also noticed this log in the agent machine:2021-07-09 05:38:37,766 - clearml - WARNING - Could not retrieve remote configuration named 'SSH' Using default configuration: {'ssh_host_ecdsa_key': '-----BEGIN EC PRIVATE KEY-----\{private key here}
Hi AgitatedDove14 thanks for your help and sorry I missed this! Iāve had this on hold for the last few days, but Iām going to try firing up a new ClearML server running Version 1.02 (Iāve been using the slightly older Allegro Trains image from the AWS marketplace) and have another try from there. Thanks for your help on Github too ā¤ Iām so blown away by the quality of everything you folks are doing, have been championing it hard at my workplace
Unfortunately no dice š Iāve opened every port from 0-11000, and am using the command clearml-session --public-ip true
on the client, but still getting the timeout message, only now it says:
` Setting up connection to remote session
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of known hosts.
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of kn...
Not sure if that gives you the answer? Otherwise if you can tell me which of the 7 containers to exec into and how to check, happy to do that
And thanks for the consistently speedy responses with getting onto issues when they pop up!
@<1523701205467926528:profile|AgitatedDove14> head of master branch
No errors getting an existing dataset @<1537605940121964544:profile|EnthusiasticShrimp49>
File "/Users/david/dataset_builder.py", line 619, in save
clearml_ds.finalize()
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 796, in finalize
self._task.mark_completed()
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/backend_interface/task/task.py", line 688, in mark_completed
tasks.CompletedRequest(
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/backend_api/...
(It seems like the web server doesnāt log the call to AWS, I just see this:{SERVER IP} - - [22/Dec/2021:23:58:37 +0000] "POST /api/v2.13/models.delete_many HTTP/1.1" 200 348 "
ID}/models/{MODEL ID}/general?{QUERY STRING PARAMS THAT DETERMINE TABLE APPEARANCE} {BROWSER INFO} "-"