Reputation
Badges 1
46 × Eureka!So I am deploying clearml-server on an on-prem server, and the checkpoints etc. are quite large for the experiments I will do.
Instead I want to periodically upload / back up this data to s3, and free up local disk space. Is that something that is supported?
I see that in my docker-compose installation, most of the big files are in /opt/clearml/data
I've also overriden CLEARML_FILES_HOST= None , and configured it in clearml.conf file. Don't know where its picking 8081 😕
this doesn't interrupt jobs, but it slows it down, and it takes a lot of time to quit (adds ~2 hours for the process to end)
Its a simple training loop that trains models for 2-3 epochs for a total of 200-300 iterations, saves a few checkpoints and saves a final model at the end of it
it worked. The env variables definitely do not work! Had to use clearml.conf along with use_credential_chain=True
That makes sense, but that would mean that each client/user has to manage the upload themselves, right?
(I'm trying to use clearml to create an abstraction over the compute / cloud)
Sorry false alarm
is the agent execution dependent on some CMD in my docker file?
Also @<1523701070390366208:profile|CostlyOstrich36> - are these actions available for on prem OSS clearml-server deployments too?
@<1523701070390366208:profile|CostlyOstrich36> , as written above, I've done that. It still tries to send to 8081
We have some scenario where a group of clearml experiments might represent a logical experiment. We then want to use all the trained models in a pipeline to generate some output.
With that output, we probably want to some third party like mechanical turk, do some custom evaluations - and some times more than once. We then want to connect (and present) these evaluations alongwith ClearML experiments.
we have various services internally to do this --> however, we have to manually link it up w...
found out the command swaps singular and plural. It's --gpus 0 and --gpu 0,1,2
I do change the task and the project name, the task name change works fine but the project name change silently fails
can I combine docker and poetry mode?
I'm thinking of using s3fs on the entire /opt/clearml/data folder. What do you think?
As mentioned above, I've tried both (env and clearml.conf). Here are my configs (I've blacked out urls and creds)
conf file
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "xyz"
"secret_key" = "xyz"
}
}
Relevant log (it uploads to S3, I can see the artefact fine on clearml's experiment tracker, but it still causes the job to hang)
2023-12-11 16:06:44,008 - clearml.sto...
With respect to unstructured data, do hyperdatasets work well with audio data (and associated metadata) ?
Hey @<1577106212921544704:profile|WickedSquirrel54> , I would definitely be interested in this. A gist would be cool too
nice! I was wondering whether we can trigger it by the UI, like "on publishing" an experiment
Thanks, I can have docker
+ poetry
execution modes then?
No, it was fixed by restarting clearml then and some services. But currently, we gave up and we use debug=True so we dont use the services queue
where is it persisted? if I have multiple sessions I want to persist, is that possible?
I tried that earlier - that checks out , it matches the s3 path I provide in the conf