Reputation
Badges 1
662 × Eureka!Right, but that's as defined in the services agent, which is not immediately transparent
Sorry, I misspoke, yes of course, the agents config file, not the queues
Would be good if that's mentioned explicitly in the docs π Thanks!
CostlyOstrich36 so internal references are not resolved somehow? Or, how should one achieve:
def my_step(): from ..utils import foo foo("bar")
The screenshot is small since the data is private anyway, but it's enough to see:
"Metric: untitled 00" "plot image" as the image title The attached histogram has a title ("histogram of ...")
It failed on some missing files in my remote_execution, but otherwise seems fine now
Thanks for your help SuccessfulKoala55 ! Appreciate the patience π
I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely
The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely
task?
From the log you shared, the task is picked up by theΒ
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
Β worker
I can try and target the default one if it helps..?
I was thinking of using the --volume
settings in clearml.conf
to mount the relevant directories for each user (so it's somewhat customizable). Would that work?
It would be amazing if one can specify specific local dependencies for remote execution, and those would be uploaded to the file server and downloaded before the code starts executing
This could be relevant SuccessfulKoala55 ; might entail some serious bug in ClearML multiprocessing too - https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh
This happened again π€
How many files does ClearML touch? :shocked_face_with_exploding_head:
Let me know if there's any additional information that can help SuccessfulKoala55 !
Because setting env vars and ensuring they exist on the remote machine during execution etc is more complicated π
There are always ways around, I was just wondering what is the expected flow π
Of course. We'd like to use S3 backends anyway, I couldn't spot exactly where to configure this in the chart (so it's defined in the individual agent's configuration)
Okay, I'll test it out by trying to downgrade to 4.0.0 and then upgrade to 4.1.2
Just to make sure, the chart_ref
is allegroai/clearml
right? (for some reason we had clearml/clearml
and it seems like it previously worked?)
Full log:
` command: /usr/sbin/helm --version=4.1.2 upgrade -i --reset-values --wait -f=/tmp/tmp77d9ecye.yml clearml clearml/clearml
msg: |-
Failure when executing Helm command. Exited 1.
stdout:
stderr: W0728 09:23:47.076465 2345 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0728 09:23:47.126364 2345 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unava...
Removing the PVC is just setting the state to absent AFAIK
For now this is okay - no data lost, really - but I'd like to make sure we're not missing any steps in the next upgrade
Hm, I'm not sure I follow π€ How does the API server config relate to the file server?
We have a mini default config (if you remember from a previous discussion we had) that actually uses the second form you suggested.
I wrote a small "fixup" script that combines this default with the one generated by clearml-init
, and it simply does:def_config = ConfigFactory.parse_file(DEF_CLEARML_CONF, resolve=False) new_config = ConfigFactory.parse_file(new_config_file, resolve=False) updated_new_config = ConfigTree.merge_configs(new_config, def_config)
I can scroll sideways but if I open any of the comparison items, I pretty much can only see one experiment's values
AgitatedDove14 The keys are there, and there is no specifically defined user in .gitmodules
:[submodule "xxx"] path = xxx url =
I believe this has to do with how ClearML sets up the git credentials perhaps?
Indeed. I'll open an issue, sure!
Yes, exactly. I have not yet had a chance to try this out -- should it work?
We have a read-only user with personal access token for these things, works seamlessly throughout and in our current on premise servers... So perhaps something missing in the autoscaler definitions?
Sounds like a nice idea π
Follow-up; any ideas how to avoid PEP 517 with the auto scaler? π€ Takes a long time to build the wheels