Reputation
Badges 1
25 × Eureka!GentleSwallow91 what you are looking for is here π
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
Hi RobustRat47
What do you mean by "log space for hyperparameter" , what would be the difference ? (Notice that on the graph itself you can switch to log scale when viewing in the UI) ?
Or are you referring to the hyper parameter optimization, allowing you to add log space ?
Ohh SubstantialElk6 please use agent RC3, (latest RC is somewhat broken sorry, we will pull it out)
Hmm you either need to run with SUDO or make sure the running user has docker run permissions
Thanks JuicyFox94 for letting us know.
I'm checking what's the status with it
I think it should be treated as failed,
I'm not sure where I stand on default behavior, it it could easily be an argument for the pipeline controller
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)
Thanks TrickyRaccoon92
I think it's about time we remove the survey link anyhow π
I'll make sure it happens ..,
let's call it an applicative project which has experiments and an abstract/parent project, or some other name that group applicative projects.
That was my way of thinking, the guys argued it will soon "deteriorate" into the first option :)
Sorry that was a reply to:
Otherwise I could simply create these tasks with Task.init,
Then running by using the
, am I right?
yep
I have put the
--save-period
while running Yolov5 and ClearML does not save the weight per epoch that I have trained. Why is this happened?
But do you still see it in the clearml UI ? do you see the models logged in the clearml UI ?
Hi, is there a possibility to use one GPU card with 2 agents concurrently
RoundMosquito25 / EnviousPanda91
You need to change the WORKER_ID (no two workers can share the same ID)CLEARML_WORKER_ID="machine:gpu01" clearml-agent daemon ....
When you have a bit of experience, please suggest a path forward, it will be great to integrate
GreasyPenguin66 you can pass:AZURE_STORAGE_ACCOUNT AZURE_STORAGE_KEY
As the default azure access/secret π
Hi RoundMosquito25
Hmm I remember this is tricky ... What's the clearml version? also where is the line you had to hack ?
gm folks, really liking ClearML so far as my top choice (after looking at dvc, mlflow), and thank you for your help here!
Thanks HurtWoodpecker30 !
Is there a recommended workflow to be able to βdrop intoβ the
exact
env
(code, venv, data) of a previous experiment (which may have been several commits ago), to reproduce that experiment?
You can use clearml-agent on your local machine to build the env of any Task,
` clearml-agent build --id <ta...
but can it NOT use /tmp for this iβm merging about 100GB
You mean to configure your Temp folder for when squashing ?
you can do hack the following:
` import tempfile
tempfile.tempdir = "/my/new/temp"
Dataset squash
tempfile.tempdir = None `But regradless I think this is worth a GitHub issue with feature request, to set the temp folder///
I am asking this because my NGINX server is giving Gateway Timeouts for delete calls sometimes.
Sync ... it might make sense if you have a lot of load. it might also be that the server is preoccupied with other requests
. Perhaps it is the imports at the start of the script only being assigned to the first task that is created?
Correct!
owever when I split the experiment task out completely it seems to have built the cloned task correctly.
Nice!!
Hi VexedCat68
can you supply more details on the issue ? (probably the best is to open a github issue, and have all the details there, so we have better visibility)
wdyt?
but I don't see any change...where is the link to the file removed from
In the meta data section, check the artifacts "state" object
How are these two datasets different?
Like comparing two experiments :)
Hi RattyBat71
Do you tend to create separate experiments for each fold?
If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!
repeat it until they are all dead π