Reputation
Badges 1
195 × Eureka!something like in the example I shared<Machine 1> #Init Optimizer <Machine 2> **heavy one time Common Initialization** while True: #sample Optimizer # init task # Execute Something # report results <Machine i> **heavy one time Common Initialization** while True: #sample **same** Optimizer # init task # Execute Something # report results
this one is with the brave browser but I get the same with chrome
very strange that you don't see the same in the community server
let me try to explain myself again
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob to spwan them
This sounds like it can work. we are talking about something like:
` #<Machine 1>
#Init Optimizer with some dedicated queue
<Machine 2>
heavy one time Common Initialization
while True:
# sample queue
# enqueue with LocalClearMLJob
# Execute Something
# report results
<Machine i>
heavy one time Common Initialization
while True:
# sample same queue
# enqueue wi...
What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
Lets say you are doing bayesian sampling of some parameter with your optimizer, that means the next sample will be a function of previous samples. And all of this is contained in the optimizer state (in the optuna optimizer case in the study object). So to have an option to run some optimization in the way described in the example the communication with the optimizer task should hav...
am I supposed to change the WeightsFileHandler inplace?
sounds like an ok option, any recommended reference for how to start?
It does, I am familiar with it I used it many times
AgitatedDove14 , seem to work significantly better! thanks!
cool thanks! local is quite confusing in this context.. but works 🙂
AgitatedDove14 ,maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks
AgitatedDove14 , well.. having the demo server by default lowers the effort threshold for trying ClearML and getting convinced it can deliver what it promises, and maybe test some simple custom use cases. I don't know what are the behind the scenes considerations in terms of costs of keeping the demo server running, but even having a leaner version where you limit the duration in which the experiment records are deleted after a week or few days sounds useful to me
AgitatedDove14 , mostly out of curiosity, what is the motivation behind introducing this as an environment variable knob rather then a flag with some default in Task.init?
Yes I already learned about it from this thread 🙂
we see this:
$ ps ax | grep python
10589 ? S 0:05 python3 fileserver.py
10808 ? Sl 18:07 python3 -m apiserver.server
30047 pts/0 S+ 0:00 grep --color=auto python
AgitatedDove14 I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
Thanks Martin! I'll test it in the following days, I'll keep you updated!
Hi AgitatedDove14 , I am not uploading anything explicitly, and when I look at the UI Models tab I can only see the regular "{Project Name} - epoch={#}" and in addition "{Project Name} - {project_id}" so I am not sure what is really uploaded.. from the name of it it sounds like model weights and buffers (non-trainable)
neither Task nor task seem to have this attribute 🤔
for keys that are present in both the remote and local configuration the expected behavior is that the remote overrides the local, that what happens in my agent runs
SuccessfulKoala55 .. so no ideas how to proceed?
CostlyOstrich36 , I am trying to get the config values as in the example:task.connect(test_config)
I expect that the returned connected dict will override existing keys the local dict with matching keys from the remote task dict
the config I'm talking about is the General section in the Hyper-Parameters under the configuration tab
CostlyOstrich36 , I don't think we are talking about the same things..
I am trying to mimic an agent pulling a task, and while running it syncing some custom configuration dict I have according to the task configuration (overriding the defaults)
AgitatedDove14 should be, I'll try to create a small example later today or tomorrow
we have 8 core 16 gb ram, API server uses uses 1 core 100% and everything else seem to be in low utilization. it is a standard installation. how can we change the number of internal API server handler processes??
TimelyPenguin76 , I generate it manually , clone some task -> adjust config -> enqueue
then when the agent pulls it I get the following behaviorremote_dict = task.connect(local_dict) # matching keys are overridden from remote config
which I can't reproduce as I described above