Reputation
Badges 1
21 × Eureka!Thank you for the quick response, then should i mount the file clearml.conf
inside the container ?
Hello again and sorry fo the delay,
I tried what you have told me and got it to work but one issue i have is that i want to use ssh when cloning the repo:
clearml-task --project name --name task_name --repo git@gitlab.com:username/project.git --commit commit_sha --script path/to/script.py --queue queue
This doesn't work saying that :
Error: Script entrypoint file mailto:/home/usename/git@gitlab.com :username/project.git/path/to/script.py'
could not be found.
I actually read that documentation but more specifically i need an example on how to do it if possible. As i mentioned itried to run clearml-session but it takes forever and nothing happen
I am basically saving checkpoints ( so thing like totch.save( http://foo.pt )) and the output_ur=
s3://bucket/folder
SuccessfulKoala55 , Yes absolutely, i have this section:
` "sdk":{
aws: {
s3: {
credentials: [
{
bucket: xxxx,
key: "xxx",
secret: xxxx,
host: "xxxxx",
secure: true,
},
]
}
}
} `
Thank you it worked. so i am half way through.
Is there a solution where i can use the clearml-task command directly? it would help to kick the experiment from the gitlab ci.
clearml-task --project ++
--name ++
--docker ++
-- --script ++
I don't think it is possible right ? giving that i should mount the config file every time ?
Sorry but no, i already have clearml agent running as a pod. My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
I have an an experiment ( some code in Tensorflow) I containerized my code inside a docker container -inside the container already set the credentials to my clearml server (i can see logs, plots artifacts etc etc)
Now i am using Tfjobs to run my experiment in the cluster ( https://www.kubeflow.org/docs/components/tra...
how can i do that ? honestly i am confused as how to make it work in my case.
AgitatedDove14 Hello, actually no. If i can have a concrete example on how to do it it would be helpful.
For instance:
"So basically once you see an experiment in the UI, it means you can launch it on an agent."
But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here. So my use case is anyone of my...
We use both we have our on prem cluster, and we have old clusters on GKE. Having it documented would a much help for me.
The file is in the right path... But the web ui still the same
Thank you, i will try it and ping you later , many thanks