Reputation
Badges 1
383 × Eureka!Sorry if it was confusing. Was asking if people have setup pipelines automatically triggered on update to datasets
For different workloads, I need to habe different cluster scaler rules and account for different gpu needs
AgitatedDove14 sounds almost what might be needed, will give it a shot. Thanks, as always 🙂
It completed after the max_job limit (10)
How is clearml-session intended to be used?
Thanks AgitatedDove14 - i get overall what you are saying. Have to get glue setup, which I couldn’t understand fully, so that’s a different topic 🙂
AlertBlackbird30 - i don’t understand why it can’t be a focus though. Probably missing some context.
Ok so that’s nothing more than what I would configured in the clearml config then
AlertBlackbird30 :
--remote-gateway [REMOTE_GATEWAY] Advanced: Specify gateway ip/address to be passed to interactive session (for use with k8s ingestion / ELB)
I see this in clearml-session - what’s the intent here?
But ok the summary is I guess it doesn’t work in a k8s env
I would like to create a notebook instance and start using it without having to do anything on a dev box
Ah ok. Kind of getting it, will have to try the glue mode
AgitatedDove14 - these instructions are out of date? https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_kubernetes_helm.html
All right got it, will try it out. Thanks for the quick response.
The helm chart installs a agentservice, how is that related if at all?
AlertBlackbird30 - got it running. Few comments:
Nodeport is set by default despite being parameter in values.yml. For example:` webserver:
extraEnvs: []
service:
type: NodePort
port: 80 `2. Ingress was using 8080 for webserver but service was 80
3. Had to change path in ingress to “/*” instead of “/” to get it working for me
Beyond this have the UI running, have to start playing with it. Any suggestions for agents with k8s?
Thanks! Is there GPU support, not clear from the Readme AgitatedDove14
sure, will do AlertBlackbird30
Updating to 1.1.0 gives this error:
ERROR: Could not push back task [e55e0f0ea228407a921e004f0d8f7901] to k8s pending queue [c288c73b8c434a6c8c55ebb709684b28], error: Invalid task status (Task already in requested status): current_status=queued, new_status=queued
Do people generally update the same model “entry”? That feels so wrong to me…how do you reproduce a older model version or do a rollback etc?
is it known?
Hey SuccessfulKoala55 Like I mentioned, I have a spacy ner model that I need to serve for inference.