Reputation
Badges 1
25 × Eureka!Thanks AgitatedDove14 ! Just to make sure I’m understanding correctly, do you mean that the ClearML Web server in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server issues a delete command to the ClearML API server, which is then responsible for trying to delete the files in S3? And that I need to enter an AWS key/secret in the profile page of the web app here?
Hi UnevenDolphin73 sorry for the slow reply, been on leave!
We don’t have a solution right now, but if there’s no fix to the frontend in the near future we’ll probably try to write a script that queries the ClearML API for all artefacts, queries S3 for all artefacts, and figures out orphaned artefacts to delete.
Ok great! I’ve actually provided a key and secret so I guess it should be working. Would you have any suggestions about where I could look to debug? Maybe the docker logs of the web server?
Thanks Shay, good to know i just hadn't configured something correctly!
EDIT: Turns out in that AMI, the dockerfile has:agent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
So I changed# CLEARML_API_HOST:
`
CLEARML_API_HOST: ${CLEARML_A...
(It seems like the web server doesn’t log the call to AWS, I just see this:{SERVER IP} - - [22/Dec/2021:23:58:37 +0000] "POST /api/v2.13/models.delete_many HTTP/1.1" 200 348 "
ID}/models/{MODEL ID}/general?{QUERY STRING PARAMS THAT DETERMINE TABLE APPEARANCE} {BROWSER INFO} "-"
Oh wow thanks SuccessfulKoala55 , so sorry I didn’t think to check the agent docs! 😅
File "/Users/david/dataset_builder.py", line 619, in save
clearml_ds.finalize()
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/datasets/dataset.py", line 796, in finalize
self._task.mark_completed()
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/backend_interface/task/task.py", line 688, in mark_completed
tasks.CompletedRequest(
File "/Users/david/miniconda3/envs/ml/lib/python3.9/site-packages/clearml/backend_api/...
No errors getting an existing dataset @<1537605940121964544:profile|EnthusiasticShrimp49>
And thanks for the consistently speedy responses with getting onto issues when they pop up!
Not sure if that gives you the answer? Otherwise if you can tell me which of the 7 containers to exec into and how to check, happy to do that
@<1523701205467926528:profile|AgitatedDove14> head of master branch
Unfortunately no dice 😕 I’ve opened every port from 0-11000, and am using the command clearml-session --public-ip true
on the client, but still getting the timeout message, only now it says:
` Setting up connection to remote session
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of known hosts.
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel
Warning: Permanently added '[<IP address>]:10022' (ECDSA) to the list of kn...
Update: I see that by default it uses 10022 as the remote SSH port, so I’ve opened that as well (still getting the “tunneling failed” message though).
I’ve also noticed this log in the agent machine:2021-07-09 05:38:37,766 - clearml - WARNING - Could not retrieve remote configuration named 'SSH' Using default configuration: {'ssh_host_ecdsa_key': '-----BEGIN EC PRIVATE KEY-----\{private key here}
Hi AgitatedDove14 thanks for your help and sorry I missed this! I’ve had this on hold for the last few days, but I’m going to try firing up a new ClearML server running Version 1.02 (I’ve been using the slightly older Allegro Trains image from the AWS marketplace) and have another try from there. Thanks for your help on Github too ❤ I’m so blown away by the quality of everything you folks are doing, have been championing it hard at my workplace
Oh that’s cool, I assumed the DevOps project was just examples!
There’s a jupyter_url
property there that is http://{instance's_private_ip_address}:8888?token={jupyter_token}
There’s alsoexternal_address {instance_public_ip_address} internal_ssh_port 10022 internal_stable_ssh_port 10023 jupyter_port 8888 jupyter_token {jupyter_toke} vscode_port 9000
Maybe this is something stupid to do with VPCs that I should understand bet...