Reputation
Badges 1
21 × Eureka!Hello again and sorry fo the delay,
I tried what you have told me and got it to work but one issue i have is that i want to use ssh when cloning the repo:
clearml-task --project name --name task_name --repo git@gitlab.com:username/project.git --commit commit_sha --script path/to/script.py --queue queue
This doesn't work saying that :
Error: Script entrypoint file mailto:/home/usename/git@gitlab.com :username/project.git/path/to/script.py'
could not be found.
Thank you for the quick response, then should i mount the file clearml.conf
inside the container ?
Thank you it worked. so i am half way through.
Is there a solution where i can use the clearml-task command directly? it would help to kick the experiment from the gitlab ci.
clearml-task --project ++
--name ++
--docker ++
-- --script ++
I don't think it is possible right ? giving that i should mount the config file every time ?
Thank you, i will try it and ping you later , many thanks
I actually read that documentation but more specifically i need an example on how to do it if possible. As i mentioned itried to run clearml-session but it takes forever and nothing happen
how can i do that ? honestly i am confused as how to make it work in my case.
The file is in the right path... But the web ui still the same
AgitatedDove14 Hello, actually no. If i can have a concrete example on how to do it it would be helpful.
For instance:
"So basically once you see an experiment in the UI, it means you can launch it on an agent."
But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here. So my use case is anyone of my...
Sorry but no, i already have clearml agent running as a pod. My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
I have an an experiment ( some code in Tensorflow) I containerized my code inside a docker container -inside the container already set the credentials to my clearml server (i can see logs, plots artifacts etc etc)
Now i am using Tfjobs to run my experiment in the cluster ( https://www.kubeflow.org/docs/components/tra...
We use both we have our on prem cluster, and we have old clusters on GKE. Having it documented would a much help for me.
I am basically saving checkpoints ( so thing like totch.save( http://foo.pt )) and the output_ur=
s3://bucket/folder
SuccessfulKoala55 , Yes absolutely, i have this section:
` "sdk":{
aws: {
s3: {
credentials: [
{
bucket: xxxx,
key: "xxx",
secret: xxxx,
host: "xxxxx",
secure: true,
},
]
}
}
} `