it would be great to get logs from apiserver and fileserver pods when deleting a file from ui so we can see what is going on. I’m saying this because, at first glance, I don’t see anyissue in your config
the api server doesn't say much
[2024-03-18 13:17:45,323] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 3ms
[2024-03-18 13:17:46,141] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.delete_many in 791ms
[2024-03-18 13:17:46,360] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 10ms
as for the file server, is that really needed if I'm storing things in the s3 bucket??? This is the only log I get
Loading config from /opt/clearml/fileserver/config/default
Loading config from file /opt/clearml/fileserver/config/default/logging.conf
Loading config from file /opt/clearml/fileserver/config/default/fileserver.conf
Loading config from /opt/clearml/config
* Serving Flask app 'fileserver'
* Debug mode: off
ok this is weird, in apiserver we should see call for deletion request. I need to consult with some people because I don’t think this is infra config related.
@<1673863788857659392:profile|HomelyRabbit25> can you confirm the apiserver loads configuration from the mounted services.conf file?
hey, I can confirm that I have a file in the correct location according to this doc
root@clearml-apiserver-5cb4495f9f-2p7wg:/opt/clearml# cat /opt/clearml/config/services.conf
storage_credentials {
aws {
s3 {
use_credentials_chain: false
credentials: [
{
host: "machine-learningbla.com:443"
bucket: "machine-learning-bucket"
key: "UdifdasfBS"
secret: "---6HAE----O"
region: "on-prem"
secure: true
multipart: false
},
]
}
}
}
Not sure how to see if it's loading from that file, I don't see the CLEARML_CONFIG_DIR
env variable in my pod. I see this when the apiserver initializes
[2024-03-18 13:50:27,317] [18] [INFO] [clearml.service_repo] Loading services from /opt/clearml/apiserver/services
[2024-03-18 13:50:50,395] [18] [INFO] [clearml.service_repo] Returned 200 for debug.ping in 0ms
...
Inside directory /opt/clearml/apiserver/services
I have this
root@clearml-apiserver-5cb4495f9f-276wp:/opt/clearml# ls apiserver/services
__init__.py __pycache__ auth.py debug.py events.py login models.py organization.py pipelines.py projects.py queues.py reports.py server tasks.py users.py utils.py workers.py
I just confirmed that it's indeed loading the config from this file
@<1673863788857659392:profile|HomelyRabbit25> What happens when you delete the files from UI? Can you please share the logs from the async_delete service? This is the service that is actually responsible for the files deletion and the s3 configuration that you prepared should be mapped into that service (not the apiserver)
@<1523701994743664640:profile|AppetizingMouse58> hi, when I delete a file I don't get any error, the linked task gets deleted from the clearml db, but the artifact is still in my bucket. I see the async_delete service in the docker compose file, but I can't find where it's executed in the helm chart
@<1523701994743664640:profile|AppetizingMouse58> @<1523701087100473344:profile|SuccessfulKoala55> I've been looking and I can't find any call to the async_urls_delete job in the helm chart, can you confirm this is the case? or am I confused? Thanks!
@<1673863788857659392:profile|HomelyRabbit25> We are planning to release a new version v1.15 in a few days that will support this job in helm charts. Currently this option does not exist in K8s deployment and the apiserver is not deleting task artifacts from external storages
awesome, thanks! I'll wait for this new version :)
hi @<1523701994743664640:profile|AppetizingMouse58> ! I saw that the new release is out, just wanted to confirm that this problem should be solved so I can make the change. I don't see anything in the changelist mentioning this issue.
Hi @<1673863788857659392:profile|HomelyRabbit25> , yes it should include the support for async_delete service. Please provide the storage_credentials configuration to this service instead of the apiserver. For the details of whether the deletion works or it has any issues with the provided configuration please inspect the logs from the async_delete pod.
I see it now, awesome thanks! I'll give it a try 🙂
correct me if I'm wrong, the chart is missing the following
volumeMounts:
- name: apiserver-config
mountPath: /opt/clearml/config
in the clearml-apiserver
definition
https://github.com/allegroai/clearml-helm-charts/blob/4ca4bc82c48a403060c1d43b93ab[…]/charts/clearml/templates/apiserver-asyncdelete-deployment.yaml
I added it and now it is working
this one should not be needed for asyncdelete, what is the error you are getting?
I was getting an error saying credentials couldn't be found to delete objects in my s3 bucket
it’s weird, can you pls open a bug in clearml-helm-charts repo?
if it's not needed, then why is the apiserver-config it in the volumes section of the asyncdelete deployment??
https://github.com/allegroai/clearml-helm-charts/blob/4ca4bc82c48a403060c1d43b93ab[…]/charts/clearml/templates/apiserver-asyncdelete-deployment.yaml
for sure, I'll open a bug