Hi ScatteredClams84
Is there any parameter that adjusts the "number of files that can be stored in the cache"? I am using clearml python version 1.0.3 to upload artifacts and get the artifacts back from a task.Β (edited)
Yes you are correct, the default value is 100 entries.
You can configure it in the clearml.conf, just add:sdk.storage.cache.default_cache_manager_size = 1000
or from code:
` from clearml.storage.cache import CacheManager
CacheManager.get_cache_manager(cache_file_...
Intersting!
I would also add that Task name is not unique and you can use to describe the "process / goal etc" which would make it pretty obvious to search / review from the UI.
Regrading models and branchs, Iw ould use the Task tags (you can have as many as you like) to tag the specific model type (or dev branch if the alg is diff), this means you can also easily filter based on the Tags in the UI.
can you use the Web UI to compare the artifacts from two separate subprojects?
Yes comp...
I can add files to the data set, even after I finish the experiment?
Correct
https://clear.ml/docs/latest/docs/clearml_data#creating-a-dataset
https://clear.ml/docs/latest/docs/guides/data%20management/data_man_cifar_classification
https://github.com/allegroai/clearml/blob/master/docs/datasets.md#create-dataset-from-code
Hmm I think the easiest is using the helm chart:
https://github.com/allegroai/clearml-server-helm-cloud-ready
I know there is work on a teraform template, not sure about instio.
Is helm okay for you ?
See Args section in the screenshot
"Args/counter"
FYI all the git pulls are cached even in docker mode so there is no "tax" to pay for pulling the sub-modules (only the first time of course)
OddShrimp85
the Task id is UUID that is generated by the backend server, there is no real way to force it to have a specific value π
MelancholyElk85 that looks great, let me see how quickly we can push it (I think 1.1.5 needs to be pushed very soon, I'll check if we can have it before π )
SmallDeer34
I think this is somehow related to the JIT compiler torch is using.
My suspicion is that JIT cannot be initialized after something happened (like a subprocess, or a thread).
I think we managed to get around it with 1.0.3rc1.
Can you verify ?
Hi @<1661904968040321024:profile|SpotlessOwl43>
My problem is that when the AWS virtual machine is killed, my Pipelines and Scheduling stop working because of the killed ClearML agent,
are you using the ClearML AWS autoscaler to spin that machine ? or are you spinning it manually ?
SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushedtask.flush(wait_for_uploads=True)
That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just dosleep(3.0)
And it wil...
Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?
Hi @<1655744373268156416:profile|StickyShrimp60>
My hydra OmegaConf configuration object is not always being picked up, and I am unable to consistently reproduce it.
... I am using clearml v1.14.4,
Hmm how can we reproduce it? what are you seeing what it does "miss" the hydra, i.e. are you seeing any Hydra section? how are you running the code (manually , agent ?)
Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)
@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None
As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)
PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py
Hi RipeGoose2
What exactly is being uploaded ? Are those the actual model weights or intermediate files ?
It seems stuck somewhere in the python path... Can you check in runtime what's os.environ['PYTHONPATH']
Or you want to generate it from a previously executed run?
How can I ensure that additional tasks arenβt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
However, the pipeline experiment is not visible in the project experiment list.
I mean press on the "full details" in the pipeline page
` task = Task.init(...)
assume model checkpoint
if task.models['output']:
get the latest checlpoint
model_file_or_path = task.models['output'][-1].get_local_copy()
load the model checkpoint
run training code `RoughTiger69 Would the above work for you?
but this will be invoked before fil-profiler starts generating them
I thought it will flush in the background π
You can however configure the profiler to a specific folder, then mount the folder to the host machine:
In the "base docker args" section add -v /host/folder/for/profiler:/inside/container/profile
YEYYYYYYyyyyyyyyyyyyyyyyyy
Are you referring to Poetry ?
` from clearml.automation.parameters import LogUniformParameterRange
sampler = LogUniformParameterRange(name='test', min_value=-3.0, max_value=1.0, step_size=0.5)
sampler.to_list()
Out[2]:
[{'test': 1.0},
{'test': 3.1622776601683795},
{'test': 10.0},
{'test': 31.622776601683793},
{'test': 100.0},
{'test': 316.22776601683796},
{'test': 1000.0},
{'test': 3162.2776601683795}] `
then when we triggered a inference deploy it failed
How would you control it? Is it based on a Task ? like a property "match python version" ?