Reputation
Badges 1
17 × Eureka!ClearML results page:
ClearML dataset page:
2025-02-11 13:43:01,001 - clearml.Metrics - ERROR - Action failed <500/100: events.add_batch/v1.0 (General data error: err=2 document(s) failed to index., extra_info=[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [2] requests and a refresh])>
not compressing
Generating SHA2 hash for 3...
if it's a helm chart this is the component that does that job
None
I think the chart is pretty self explanatory, I don't think there's any documentation but this . Happy to help though, I am using both of those charts (different versions) in an on-prem cluster.
hey @<1523701827080556544:profile|JuicyFox94> , don't want to bother, I just check on the progress of that pull request and seems like it passed all the checks but it was marked as draft and not merged
the api server doesn't say much
[2024-03-18 13:17:45,323] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 3ms
[2024-03-18 13:17:46,141] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.delete_many in 791ms
[2024-03-18 13:17:46,360] [12] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 10ms
as for the file server, is that really needed if I'm storing things in the s3 bucket??? This is the only log I get
Loading config from /op...
@<1523701994743664640:profile|AppetizingMouse58> hi, when I delete a file I don't get any error, the linked task gets deleted from the clearml db, but the artifact is still in my bucket. I see the async_delete service in the docker compose file, but I can't find where it's executed in the helm chart
awesome, thanks! I'll wait for this new version :)
hey, I can confirm that I have a file in the correct location according to this doc
root@clearml-apiserver-5cb4495f9f-2p7wg:/opt/clearml# cat /opt/clearml/config/services.conf
storage_credentials {
aws {
s3 {
use_credentials_chain: false
credentials: [
{
host: "machine-learningbla.com:443"
bucket: "machine-learning-bucket"
...
if it's not needed, then why is the apiserver-config it in the volumes section of the asyncdelete deployment??
https://github.com/allegroai/clearml-helm-charts/blob/4ca4bc82c48a403060c1d43b93ab[…]/charts/clearml/templates/apiserver-asyncdelete-deployment.yaml
for sure, I'll open a bug
hi @<1523701994743664640:profile|AppetizingMouse58> ! I saw that the new release is out, just wanted to confirm that this problem should be solved so I can make the change. I don't see anything in the changelist mentioning this issue.
I see it now, awesome thanks! I'll give it a try 🙂
@<1523701994743664640:profile|AppetizingMouse58> @<1523701087100473344:profile|SuccessfulKoala55> I've been looking and I can't find any call to the async_urls_delete job in the helm chart, can you confirm this is the case? or am I confused? Thanks!
I was getting an error saying credentials couldn't be found to delete objects in my s3 bucket
correct me if I'm wrong, the chart is missing the following
volumeMounts:
- name: apiserver-config
mountPath: /opt/clearml/config
in the clearml-apiserver definition
[https://github.com/allegroai/clearml-helm-charts/blob/4ca4bc82c48a403060c1d43b93ab[…]/charts/clearml/templates/apiserver-asyncdelete-deployment.yaml](https://github.com/allegroai/clearml-helm-charts/blob/4ca4bc82c48a403060c1d43b93ab1577a92acb42/charts/clearml/templates/apiserver-asyncdelete-deployment.yam...
hey @<1523701087100473344:profile|SuccessfulKoala55> that seems to work, thanks! One thing that it's not yet that clear to me is, what would be the recommended way of running agents in kubernetes? As I understand there is the ClearML Agent Helm Chart, which uses the k8s glue code, and running a clearml-agent daemon inside a pod (that already has the gpus assigned to it). Which one is the preferred way? I see issues with both approaches, and personally I believe that the Helm Chart is the co...
hey @<1523701087100473344:profile|SuccessfulKoala55> , wondering if you had any ideas around this??
I just confirmed that it's indeed loading the config from this file