
Reputation
Badges 1
46 × Eureka!I do change the task and the project name, the task name change works fine but the project name change silently fails
I want the script to be agnostic to whether it is run using clearml or not, with a particular queue or not
Also @<1523701070390366208:profile|CostlyOstrich36> - are these actions available for on prem OSS clearml-server deployments too?
We have some scenario where a group of clearml experiments might represent a logical experiment. We then want to use all the trained models in a pipeline to generate some output.
With that output, we probably want to some third party like mechanical turk, do some custom evaluations - and some times more than once. We then want to connect (and present) these evaluations alongwith ClearML experiments.
we have various services internally to do this --> however, we have to manually link it up w...
Would I also be able to change the task name from within the subprocess?
@<1537605940121964544:profile|EnthusiasticShrimp49> , now that I have run the task on remote, can I copy the artefacts/files it creates back to my local fs?
Lets say the artefacts are something likeartefacts = [checkpoint.pth, dvc.lock, some_other_dynamically_generated_file]
nice! I was wondering whether we can trigger it by the UI, like "on publishing" an experiment
As mentioned above, I've tried both (env and clearml.conf). Here are my configs (I've blacked out urls and creds)
conf file
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "xyz"
"secret_key" = "xyz"
}
}
Relevant log (it uploads to S3, I can see the artefact fine on clearml's experiment tracker, but it still causes the job to hang)
2023-12-11 16:06:44,008 - clearml.sto...
where is it persisted? if I have multiple sessions I want to persist, is that possible?
@<1523701070390366208:profile|CostlyOstrich36> , as written above, I've done that. It still tries to send to 8081
Thanks, I can have docker
+ poetry
execution modes then?
That makes sense, but that would mean that each client/user has to manage the upload themselves, right?
(I'm trying to use clearml to create an abstraction over the compute / cloud)
In the end I forked the clearml-session library and removed mechanisms to access the interactive terminal. I added ipc=host.
There's one identifiable issue with clearml-session+tailscale though - while it does launch the daemon properly, it registers the wrong ip address to the task (sometimes the external ip address even when --external is not passed). At the end of the day, if we know which machine it was launched on, we're able to replace that ip address with a tailscale equivalent and st...
I need to mock it - because I'm writing some unittests
So I am deploying clearml-server on an on-prem server, and the checkpoints etc. are quite large for the experiments I will do.
Instead I want to periodically upload / back up this data to s3, and free up local disk space. Is that something that is supported?
I see that in my docker-compose installation, most of the big files are in /opt/clearml/data
With respect to unstructured data, do hyperdatasets work well with audio data (and associated metadata) ?
I set it up like this: clearml-agent daemon --detached --gpus 0,1,2 --queue single-gpu-24 --docker
but when I create the session : clearml-session --docker xyz --git-credentials
and I run nvidia-smi
I only see one gpu
Thanks! so it seems like the key is the Task.connect
and bubble up params to original task, correct?
is the agent execution dependent on some CMD in my docker file?
How does it work with k8s? how can I request the two pods to sit on the same gpu?
Its a simple training loop that trains models for 2-3 epochs for a total of 200-300 iterations, saves a few checkpoints and saves a final model at the end of it
Hmmm, my only issue there is that not all of my "artefacts" are clearml artefacts.
The files I need are models and other locally modified files that get generated by the clearml task on remote
this doesn't interrupt jobs, but it slows it down, and it takes a lot of time to quit (adds ~2 hours for the process to end)