Reputation
Badges 1
121 × Eureka!we also might have some other steps incorporated for other tools. We intend to have Label-Studio upstream.. So defintely needed some orchestrator tool
One use case now :
Load Data from Label Studio (Manager to manually approve) Push data to Clearml-data Run Training (Manager to manually Publish) Pushes model uri to next step Seldon deploy itLater, if seldon detects a data drift, it will automatically run (steps 2-5)..
At this point, we havent drilled all of it down yet
RoughTiger69
So prefect tasks :
Loads Data into clearml-data Runs trainining in clearml Publish model (manual trigger required, so user publishes model) and return model url Seldon deploys the model ( model url passed in)
In our local setup, we use minio though ?
CostlyOstrich36 :
They mentioned that they already have a Nexus backend. So just was wondering if we could use it for storage purposes.
we can always use the latest Clearml.
We were thinking of a use case for a client, who has Sonatype Nexus in their environment ? Could we leverage on it, or would we need minio instead ?
This is where I downloaed the log. Seems like some docker issue, though i cant seem to figure it out. As an alternative, I spawned a clearml-agent outside the k8 environment and it was able to execute well.
Hi, will proceed to close this thread. We found some issue with the underlying docker in our machines. We've have not shifted to another k8 of ec2 instances in AWS.
This is my example. Iteration 10 so there are 10 runs. Looking at the 4th run, 60% of the jobs, 91% iteration, 94% time.. What does it mean ?
MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
For me too, had this issue.. I realised that k8s glue, wasnt using the GPU resource compared to running it as clearml-agent..TimelyPenguin76 suggested using the latest Cuda11.0 images, though it also didnt work.
More than the documentation, my main issue was that naming executed is far too vague.. Maybe something like executed_task_id or something along that line is more appropriate. 👍
Is there any documentation on how, we can use this ports mode ? I didnt seem to find any.. Tks
The use case, is lets say i runpython k8s_glue_example.py --queue glue_q
And some guys pushes an hyperparameterization job with over 100 experiments to the glue_q, one minute later, I push a simple training job to glue_q.. But I will be forced to wait for the 100 experiments to finish.
AgitatedDove14 I am confused now.. Isnt this feature not available in the k8 glue ? Or is it going to be implemented ?
Nothing changed.. the clearml.conf is still as is (empty)
Hi martin, i just untemplate-ed thehelm template clearml-server-chart-0.17.0+1.tgz
I found this lines inside.- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearml
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks
Ah kk, it is ---laptop:0 worker is no more now.. But wrt to our original qn, I can see the agent(worker) in the clearml-server UI ..
Something is wierd.. It is showing workers which are not running now...
Yup, tried that.. Same error also
Hi, sorry for the delayed response. Btw, all the pods are running all good.
When I push a job to an agent node, i got this error.
"Error response from daemon: network None not found"