
Reputation
Badges 1
121 × Eureka!This is my example. Iteration 10 so there are 10 runs. Looking at the 4th run, 60% of the jobs, 91% iteration, 94% time.. What does it mean ?
Nice tutorial.. Though personally, I prefer a more clean-cut presentation (without the Yays and muaks or the the turtle). 😄 But usually, as long as content is there, it shldnt matter...
nice.. this looks a bit friendly.. 🙂 .. Let me try it.. Thanks
Just to add on, I am using minikube now.
Yup, i used the value file for the agent. However, i manually edited for the agentservices (as there was no example for it in the github).. Also I am not sure what is the CLEARML_HOST_IP (left it empty)
Something is wierd.. It is showing workers which are not running now...
Yeah, I restarted the deployment and sshed into the host machine also.. (Img below)
I just checked the /root/clearml.conf file and it just containssdk{ }
Ah kk, it is ---laptop:0 worker is no more now.. But wrt to our original qn, I can see the agent(worker) in the clearml-server UI ..
Nothing changed.. the clearml.conf is still as is (empty)
Hi, Some walk around I thought of.. Btw, I havent tried . AnxiousSeal95 , your comments
1 ) Attach a clearml-task id to each new dataset-id
So in the future, when new data comes in, get the last data commit from the project(Dataset) and get the clearml-task for it. Then clone the clearml-task, and pass in the new data. The only downside, is the need to clone the cleaml-task.
Or alternatively
2) Attach a gitsha-id of the processing code to each new dataset-id.
This can't give the exact code ...
The use case, is lets say i runpython k8s_glue_example.py --queue glue_q
And some guys pushes an hyperparameterization job with over 100 experiments to the glue_q, one minute later, I push a simple training job to glue_q.. But I will be forced to wait for the 100 experiments to finish.
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
Hi, for the values.yaml, is there some reference for it esp so , if we assign more Memory to webserver service etc. I tried googling around but so far no luck
I just downloaded the logs from the Failed task. Seem I have set the agent.package_manager.system_site_packages: true
in the agent as well.
RoughTiger69
So prefect tasks :
Loads Data into clearml-data Runs trainining in clearml Publish model (manual trigger required, so user publishes model) and return model url Seldon deploys the model ( model url passed in)
When I push a job to an agent node, i got this error.
"Error response from daemon: network None not found"
Hi, sorry for the delayed response. Btw, all the pods are running all good.
Hi AgitatedDove14 , Now we prefer to run dynamic agents instead usingpython3 k8s_glue_example.py
In this case, is it still possible to pass --order-fairness at the queue level or this is more of a Enterprise edition feature.
Hi AgitatedDove14 , This isnt the issue. With or without specifying the queue, I have this error when I do the "Create version" as compared to the "Init version".
I wonder whether this is some issue with using the Create version together with execute_remotely() ..
Hi martin, i just untemplate-ed thehelm template clearml-server-chart-0.17.0+1.tgz
I found this lines inside.- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearml
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks
sure, I'll post some questions once I wrap my mind around it..
Could it be another applications's "elasticsearch-pv" and not clearml's
Hi, using the pipeline examples, withstep1_dataset_artifact.py, step2_data_processing.py, step3_train_model.py ==> pipeline_controller.py
In the above example, the pipeline_controller is stringing together 3 python files, instead could it string together 3 containers instead. Of course, we can manually compile each into a docker image, but does clearml has some similar approach baked in.
Hi AgitatedDove14 , imho links are def better, unless someone decides to archive their Tasks.. Just wondering about the possibility only..