
Reputation
Badges 1
195 × Eureka!AgitatedDove14 if we look at the host machine we can see a single python process that is actually busy
ohh actually I think I remember, when you connect a dictionary, the local dtype is used for the casting of the remote matching key (probably more nuanced)
FrothyDog40 Hi, there is a sort of a UI bug related to the mentioned above..
if you choose a single experiment then use filter or sorting such that it is not displayed anymore you can't deselect it.
you have this - sign in the top selection box (select all \ deselect all) but it doesn't do anything.
the workaround is to select an additional experiment and then when the multiple experiment bar pops up choose show experiments selected and deselect them and go back.
its inconvenient, both in th...
is there an available reference for how I can use the API calls with the python API? it is not clear to me from what you shared
CostlyOstrich36 , I don't think we are talking about the same things..
SuccessfulKoala55 .. so no ideas how to proceed?
Hi AgitatedDove14 , if you don't mind having a look too, I think its probably just a small misunderstanding
according to the above I was expecting the config to be auto-magically updated with the new yaml config I edited in the UI, however it seems like an additional step is required.. probably connect_dict? or am I missing something
TimelyPenguin76 thanks for the answer, so for example (to make sure I understand) with the example you gave above when I'll print the config I'll see the new edited parameters?
What about the second part of the question, would it be parsed according to the type hinting?
AgitatedDove14 sounds great I'm going to give it ago
and also in terms of outcome, the scalars follow the correct epoch count, but the debug samples and monitored performance metric show a different count
from the example I shared above
config_files/cfg.py
` from hydra.core.config_store import ConfigStore
from dataclasses import dataclass
@dataclass
class MasterConfig:
test: str = 'test'
cs = ConfigStore.instance()
cs.store(name="config", node=MasterConfig) `
well, kind of, I linked the other topic, but it was completely unrelated
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
Hi AgitatedDove14 , I am not uploading anything explicitly, and when I look at the UI Models tab I can only see the regular "{Project Name} - epoch={#}" and in addition "{Project Name} - {project_id}" so I am not sure what is really uploaded.. from the name of it it sounds like model weights and buffers (non-trainable)
its the module where the cfg is defined, like in the example you shared config_files/cfg.py
the offline error is different
after poking the setup in multiple ways we came to a conclusion that the API server is being clogged by calls from multiple HPOptimziners, and it utilizes a single core so it seems like we are not able to scale it up properly... any ideas?
the solution you suggested works for the single machine case. The missing part is being able to access and "claim" spawn trials (samples in the HP plane), from multiple machines
and the path it shows is correct..
AgitatedDove14 , missed your message, python
I want a manual way to access a global optimizer from multiple machines, it can be an agent, however the critical part is that machine will be able to pull and report multiple trials without restarting
AgitatedDove14 , by the way, can you take a look at https://clearml.slack.com/archives/CTK20V944/p1625558368001600
maybe you'll have other ideas? at the moment it seems like a dead end
one more question, is there a way to assign a job to a specific worker? or is it only working on queue level
Hi AgitatedDove14 , the initialization of task happens once before the multiple trainings..
` Task.init
trainer.fit(model)
something
trainer.fit(model)
... `
AgitatedDove14 ,maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no?- well at this point I'm not sure it is still essential, we have 3 run-modes offline, local-server, cloud-sever and this option made it work for all of them.. can be that it is not required anymore and its just legacy..
I run run above code, everything worked with no exception/warning...- this is strange.. you ran it with the dataclass config I added?
What is ...
from the docs::param items: List of metric keys and requested statistics
I am actually not sure specifically about \b myself, but even when replacing with . I am getting \. double backslash instead of the single backslash ( for the tuple case ). which in the case of a regexp expression changes the meaning of the expression. the expected behavior would be registering it as single backslash
AgitatedDove14 I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob to spwan them
This sounds like it can work. we are talking about something like:
` #<Machine 1>
#Init Optimizer with some dedicated queue
<Machine 2>
heavy one time Common Initialization
while True:
# sample queue
# enqueue with LocalClearMLJob
# Execute Something
# report results
<Machine i>
heavy one time Common Initialization
while True:
# sample same queue
# enqueue wi...