That could be a solution for the regex search; my comment on the pop-up (in the previous reply) was a bit more generic - just that it should potentially include some information on what failed while fetching experiments 😄
I also tried switching to dockerized mode now, getting the same issue 🤔
First bullet point - yes, exactly
Second bullet point - all of it, really. The SDK documentation and the examples.
For example, the Task object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
Any linked example to github is welcome, but some visualization/inline code with explanation is also very much welcome.
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init function with args so I can upload the environment file via StorageManager to S3 Call some start_run function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...
BTW AgitatedDove14 following this discussion I ended up doing the regex way myself to sync these, so our code has something like the following. We abuse the object description here to store the desired file path.
` config_path = task.connect_configuration(configuration=config_path, name=config_fname)
included_files = find_included_files_in_source(config_path)
while included_files:
file_to_include = included_files.pop()
sub_config = task.connect_configuration(
configurat...
StaleButterfly40 what use case are you looking for? I've used environment variables in the config file and then I can overwrite them in os.environ before ClearML loads the config
You mean at the container level or at clearml?
Yes, the container level (when these docker shell scripts run).
The per user ID would be nice, except I upload the .env file before the Task is created (it's only available really early in the code).
@<1523701087100473344:profile|SuccessfulKoala55> could you provide some instructions?
Thanks SuccessfulKoala55 and AgitatedDove14 ! We'll go through the hoops of setting up mongo on AWS then.
We're working to decouple the data from the helm chart, seems like a dangerous idea to store long term data on k8s in case of failure 😅
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
Generally, really. I've struggled recently (and in the past), because the documentation seems:
Very complete wrt available SDK (though the formatting is sometimes off) Very lacking wrt to how things interact with one anotherA lot of what I need I actually find from pluging into the source code.
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem (numpy, pandas, scipy, scikit-image, scikit-bio, scikit-learn, etc)
I created a new task with the project name internal tests , and no task name (so it's derived by ClearML).
The task was a simple print out.
The project does not appear in the project space and does not turn up on searches (the task does)
Now I tried setting pip version to <22.3 (both in the config and in the scaler's "extra config parameters"), but still it uses the latest?
added seed packages: pip==22.3.1, setuptools==65.5.1, wheel==0.38.4
One more UI question TimelyPenguin76 , if I may -- it seems one cannot simply report single integers. The report_scalar feature creates a plot of a single data point (or single iteration).
For example if I want to report a scalar "final MAE" for easier comparison, it's kinda impossible 😞
My current approach with pipelines basically looks like a GH CICD yaml config btw, so I give the user a lot of control on which steps to run, why, and how, and the default simply caches all results so as to minimize the number of reruns.
The user can then override and choose exactly what to do (or not do).
It could be related to ClearML agent or server then. We temporarily upload a given .env file to internal S3 bucket (cache), then switch to remote execution. When the remote execution starts, it first looks for this .env file, downloads it using StorageManager, uses dotenv, and then continues the execution normally
That's what I found as well, but it did not like it after all (boto is fine with it, but underlying urllib and requests were not?)
It's fine -- I see the added benefit in making sure the users set up their clearml.conf and I've made a script to edit it to our needs as part of the installation process 🙂 Thanks Martin!
Just because it's handy to compare differences and see how the data changed between iterations, but I guess we'll work with that 🙂
We'll probably do something like:
When creating a new dataset with a parent (or parents), look at immediate parents for identically-named files If those exist, load those with matching framework (pyarrow, pandas, etc), and log differences to the new dataset 🙂
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
I'll try with 1.1.5 first, then 1.1.6rc0
Of course now it's not there anymore 😆 If/when it happens again I'll ping you here 🙂
Would be good if that's mentioned explicitly in the docs 😄 Thanks!
Let me test it out real quick.
Since the additional credentials are available to the autoscaler when it boots up (via the config file), I thought it could use those natively?
Sorry for the late reply Jake -- I was away on holidays -- it works perfectly now, thanks!
QuaintPelican38 did you have a workaround for this then? Some cleanup service or similar?
Will try later today TimelyPenguin76 and report back, thanks! Does this revert the behavior to the 1.3.x one?
Thanks for the reply CostlyOstrich36 !
Does the task read/use the cache_dir directly? It's fine for it to be a cache and then removed from the fileserver; if users want the data to stay they will use the ClearML Dataset 🙂
The S3 solution is bad for us since we have to create a folder for each task (before the task is created), and hope it doesn't get overwritten by the time it executes.
Argument augmentation - say I run my code with python train.py my_config.yaml -e admin.env...