
Reputation
Badges 1
17 × Eureka!Hey Martin, do you know how to connect the agent to multiply queues?
Hey @<1729671499981262848:profile|CooperativeKitten94> yes, it did! 🙂
I thank you for the support.
I must add I do not see anything in the helmchart for using templateOverrides
lol I ended up just buying the domain. it is much easier to pay the 10$ : ))
And as it turnout, cannot specify multiply queues:
-- ClearML queue this agent will consume. Multiple queues can be specified with the following format: queue1,queue2,queue3
queue: default
gives an error
@<1729671499981262848:profile|CooperativeKitten94> Running the following conf:
queue:
services-tasks:
templateOverrides:
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
services:
templateOverrides:
resources:
requests:
nvidia.com/gpu: "0"
limits:
nvidia.com/gpu: "0"
apiServerUrlReference: "
"
fileServerUrlReference: "
``...
@<1729671499981262848:profile|CooperativeKitten94> thank you! I will try and will update : ))
Hey @<1523701070390366208:profile|CostlyOstrich36>
About the versioning, so if I have dataset A with 100pic, say 10MB.
And I create a new dataset B with 50image say 5MBbut as the dataset B as parent.
Will dataset B size add 15MB to the PVC, or only 5MB?
Same Q if it is the same dataset and new version.
Ignore the Q about restoring. thx
Hey, ignore the above please -> I had a typo in the secret key name
@<1689446563463565312:profile|SmallTurkey79> are you using the community edition with helm?
I am having the same issue here.
sure, here:
clearml:
defaultCompany: "bialek"
cookieDomain: "bialek.dev"
nameOverride: "clearml"
fullnameOverride: "clearml"
apiserver:
existingAdditionalConfigsSecret: "eso-clearml-users"
additionalConfigs:
clearml.conf: |
agent {
file_server_url:
}
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: "bialek-on-prem"
hostName: "clearml-api.bialek.dev"
tlsSecretName: "tls-clearml-apiserver"
anno...
Hey @<1523701087100473344:profile|SuccessfulKoala55>
The relevant log from the pipeline shows:
error: pathspec 'origin/${pipeline.branch}' did not match any file(s) known to git
Repository cloning failed: Command '['git', 'checkout', 'origin/$%7Bpipeline.branch%7D', '--force']' returned non-zero exit status 1.
and I believe this is because the params cannot be injected in such a way, not sure if this is a bug or normal behaviour.
I am able to to use it like so:
pipe.add_par...
Hey @<1798887585121046528:profile|WobblyFrog79> , yes testing this locally it does seems to solve the issue, thank you.
I will test it in our env.
On a different issue, have you any solution on how to make the agent listen to multiply queues?
On the helm it is written :
# -- ClearML queue this agent will consume. Multiple queues can be specified with the following format: queue1,queue2,queue3
But this does not work as the agent will read them all as one queue
Hey @<1523701070390366208:profile|CostlyOstrich36> , thx for the reply.
Can you advice if you think this will be the right path to go:
- Mongodb backup via mongodump to s3,
- Elasticsearch Install the S3 repository plugin and create a snapshot.The above can be triggered with cronjob once a week.
On restore to a fresh cluster, I will install clearml helm chart, and:
- Mongodb will download the file from S3 and use mongorestore
- Elasticsearch will download the snapshot from s3 and use:
root@master:/home/bialek# kubectl -n clearml describe po clearml-webserver-847d7c947b-hfk57
Name: clearml-webserver-847d7c947b-hfk57
Namespace: clearml
Priority: 0
Service Account: clearml-webserver
Node: clearml-server/secretip
Start Time: Sun, 04 May 2025 08:42:17 +0300
Labels: app.kubernetes.io/instance=clearml-webserver
app.kubernetes.io/name=clearml
pod-template-hash=847d7c947b
Annotation...
Hey @<1523701070390366208:profile|CostlyOstrich36>
Can you explain this point a bit more?
In the helmchart of the agent I configure like so:
...
agentk8sglue:
extraEnvs:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-access-key-id
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-secret-access-key
key: AWS_SECRET_ACCESS_KEY
- name: K8S_GLUE_MAX_PODS
value: '1'
- name:...
Hey @<1729671499981262848:profile|CooperativeKitten94> , but this is internal domain, which cause an issue with the SSL when trying to upload data to the server:
2025-05-07 18:36:22,421 - clearml.storage - ERROR - Exception encountered while uploading HTTPSConnectionPool(host='clearml-file.bialek.dev', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certif...
@<1523701087100473344:profile|SuccessfulKoala55> is this feature block by community edition maybe ?