Reputation
Badges 1
662 × Eureka!Generally, really. I've struggled recently (and in the past), because the documentation seems:
Very complete wrt available SDK (though the formatting is sometimes off) Very lacking wrt to how things interact with one anotherA lot of what I need I actually find from pluging into the source code.
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem (numpy, pandas, scipy, scikit-image, scikit-bio, scikit-learn, etc)
I created a new task with the project name internal tests , and no task name (so it's derived by ClearML).
The task was a simple print out.
The project does not appear in the project space and does not turn up on searches (the task does)
Yeah, and just thinking out loud what I like about the numpy/pandas documentation
Seems like Task.create is the correct use-case then, since again this is about testing flows using e.g. pytest, so the task is not the current process.
I've at least seen references in dataset.py 's code that seem to apply to offline mode (e.g. in Dataset.create there is if output_uri and not Task._offline_mode: , so someone did consider datasets in offline mode)
Now I tried setting pip version to <22.3 (both in the config and in the scaler's "extra config parameters"), but still it uses the latest?
added seed packages: pip==22.3.1, setuptools==65.5.1, wheel==0.38.4
Could also be related to K8, so pinging JuicyFox94 just in case 😉
One more UI question TimelyPenguin76 , if I may -- it seems one cannot simply report single integers. The report_scalar feature creates a plot of a single data point (or single iteration).
For example if I want to report a scalar "final MAE" for easier comparison, it's kinda impossible 😞
This could be relevant SuccessfulKoala55 ; might entail some serious bug in ClearML multiprocessing too - https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh
My current approach with pipelines basically looks like a GH CICD yaml config btw, so I give the user a lot of control on which steps to run, why, and how, and the default simply caches all results so as to minimize the number of reruns.
The user can then override and choose exactly what to do (or not do).
It could be related to ClearML agent or server then. We temporarily upload a given .env file to internal S3 bucket (cache), then switch to remote execution. When the remote execution starts, it first looks for this .env file, downloads it using StorageManager, uses dotenv, and then continues the execution normally
Or if it wasn't clear, that chunk of code is from clearml's dataset.py
That's what I found as well, but it did not like it after all (boto is fine with it, but underlying urllib and requests were not?)
It's fine -- I see the added benefit in making sure the users set up their clearml.conf and I've made a script to edit it to our needs as part of the installation process 🙂 Thanks Martin!
Just because it's handy to compare differences and see how the data changed between iterations, but I guess we'll work with that 🙂
We'll probably do something like:
When creating a new dataset with a parent (or parents), look at immediate parents for identically-named files If those exist, load those with matching framework (pyarrow, pandas, etc), and log differences to the new dataset 🙂
It's given as the second form you suggested in the mini config ( http://${...}:8080 ). The quotation marks are added later by pyhocon.
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
I'll try with 1.1.5 first, then 1.1.6rc0
Of course now it's not there anymore 😆 If/when it happens again I'll ping you here 🙂
Would be good if that's mentioned explicitly in the docs 😄 Thanks!
Let me test it out real quick.
Since the additional credentials are available to the autoscaler when it boots up (via the config file), I thought it could use those natively?
Sorry for the late reply Jake -- I was away on holidays -- it works perfectly now, thanks!
Right so this is checksum based? Are there plans to only store delta changes for files (i.e. store the changed byte instead of the entire file)?
QuaintPelican38 did you have a workaround for this then? Some cleanup service or similar?
The documentation is messy, I’ve complained about it the in the past too 🙈
Will try later today TimelyPenguin76 and report back, thanks! Does this revert the behavior to the 1.3.x one?
Thanks for the reply CostlyOstrich36 !
Does the task read/use the cache_dir directly? It's fine for it to be a cache and then removed from the fileserver; if users want the data to stay they will use the ClearML Dataset 🙂
The S3 solution is bad for us since we have to create a folder for each task (before the task is created), and hope it doesn't get overwritten by the time it executes.
Argument augmentation - say I run my code with python train.py my_config.yaml -e admin.env...
In which repo?:)