Hello, "In The Last Period I Pushed To Adopt Clearml Company Wide As It Is A Great Tool. We Actually Have A Data Center And All Nodes Are Managed By Rancher Meaning, Everything We Use Is Purely Kubernetes Stuff. I Deployed Clearml Server In Our

Answered

Hello, "In the last period I pushed to adopt clearML company wide as it is a great tool.
We actually have a data center and all nodes are managed by rancher meaning, everything we use is purely kubernetes stuff.
I deployed clearml server in our kubernetes cluster, and we have been able to to run experiments.
I want to push further as to use clearml-agent and clearml-serving But honestly i didn't quite get how to integrate clearml-agent in k8s : https://github.com/allegroai/clearml-agent#kubernetes-integration-optional .

Can anyone give me an exmaple that can illustrate more and give more details on how to use it? Many thanks

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedTurtle16
				
					0
					 × 1

Votes Newest

Answers 7

Hi AgitatedTurtle16 ,
In the https://github.com/allegroai/clearml-server-k8s , you can find examples for ClearML Agent deployment both as a simple, single service (as part of the https://github.com/allegroai/clearml-server-k8s/tree/master/clearml-server-chart , see https://github.com/allegroai/clearml-server-k8s/blob/master/clearml-server-chart/templates/clearml-agent-deployment.yaml ), or using a more scalable Agent Group approach (in the https://github.com/allegroai/clearml-server-k8s/tree/master/clearml-server-cloud-ready , see https://github.com/allegroai/clearml-server-k8s/blob/master/clearml-server-cloud-ready/templates/deployment-agent.yaml ) - is that what you were looking for?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Sorry but no, i already have clearml agent running as a pod. My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
I have an an experiment ( some code in Tensorflow) I containerized my code inside a docker container -inside the container already set the credentials to my clearml server (i can see logs, plots artifacts etc etc)
Now i am using Tfjobs to run my experiment in the cluster ( https://www.kubeflow.org/docs/components/training/tftraining/ ) My question is how can i make use of clearml agent in this situation to schedule these experiments using queues etc, because we have hundreds of experiments from different teams and have multiple resources (CPUs, DGX A100, MIGs etc). I want to use clearml agent to manage all of that if possible. But i couldn't really understand how to do it.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedTurtle16
				
					0
					 × 1

Hi AgitatedTurtle16

My question is how to use it to manage my experiments (docker containers). Simply put, let's say:

So basically once you see an experiment in the UI, it means you can launch it on an agent.
There is No need to containerize your experiment (actually that's kind of the idea, removing the need to always containerize everything).
The agent will clone the code, apply uncommitted changes & install the packages in the base-container-image at runtime.
This allows you to use off-the-shelf containers, and not worry about anything.
Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Hello, actually no. If i can have a concrete example on how to do it it would be helpful.
For instance:

"So basically once you see an experiment in the UI, it means you can launch it on an agent."

But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here. So my use case is anyone of my team sitting on his laptop can submit jobs to a remote kubernetes cluster, i want to be able to use an agent to take all these jobs and run launch them on the cluster. I can use gitlabCI for that for example.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedTurtle16
				
					0
					 × 1

But once i see it on the UI means it is already launched somewhere so i didn't quite get you.

The idea is you run it locally once (think debugging your code, or testing it)
While running the code the Task is automatically created, then once in the system you can clone / launch it.

Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here.

We are working on documenting the full process, I'm hoping to see something in the next week or so.
Are you running Kubernetes as a serice ? or an on-prem Kubernetes ?

So my use case is anyone of my team sitting on his laptop can submit jobs to a remote kubernetes cluster, ...

Yes this is exactly the scenario ClearML supports 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We use both we have our on prem cluster, and we have old clusters on GKE. Having it documented would a much help for me.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedTurtle16
				
					0
					 × 1

For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

7 Answers

3 years ago

2 years ago