
Reputation
Badges 1
40 × Eureka!AgitatedDove14 a single experiment, that is being paused and resumed.
inconsistrncy in yhe reporting: when resuming the 10th epoch for example and doing an extra epoch clearml iteration count is wrong for debug images and monitored metrics.. somehow not for the scalar reporting
I'll try to go with this option, I think its actually perfect for my needs
much appreciated, thanks!
I'm doing this instead
TypeError: 'bool' object is not callable
Hi AgitatedDove14 , path to the config file for trains manual execution
if I don't have internet connection on the other machine, can I just copy the artifacts and transfer them to my local machine?
I have the latest clearml version fresh from PyPi
yes that's what I meant.. this is good, thanks
I didn't get to test it on the cloud yet and trying to make final adjustments
so it sounds like there is no known issue related to this
"does not support running with no server connection." this is what I was afraid of..I'll need to figure out if I can use trains at all 😞
AgitatedDove14 the option you mentioned just before sounds much better for me, I must admit I find the name of the method confusing. I came across it before but thought its only relevant for credentials
AgitatedDove14 The use case is conditional choice of a server config, when ran locally or on the cloud..
I was trying to do exactly as you mentioned setting the environment variable before any trains import but it didn't work (and also its a mess in terms of my code).. I was hoping there is another way to go about it.. if not I'll try to create a minimal reproducible example..
edit: tweaked it a little bit for my use-case:is_demo_server = '
http://demoapi.trains.allegro.ai ' in Session.get_api_server_host()
is_server_available = requests.get(Session.get_api_server_host() + "/debug.ping").status_code == 200
AgitatedDove14 it is happening on an offline network, would be tricky to set it up we will try. so far the errors we observed were either:
Calling upload callback when starting upload: maximum recursion depth exceeded
Or
something like pending for upload (might be because we archived a run while it was uploading)
is there a built in programmatic way to adjust development.default_output_uri ?
thanks SuccessfulKoala55 , the question arose after trying to follow the instructions you attached. it seems that installing a docker on windows 10 Home is somewhat problematic
and I will also be happy to see if I can contribute maybe to this specific feature or maybe others
the ok() call seem to crash
yes, I have limited access to the machine that is running the experiment. I can't setup a server there. but I want to collect the results and view them later
by WebApp you mean the public online one? I might be confusing stuff
to put it a bit differently, I am looking for a way to manually sample and report from and to the optimizer
yes I will be happy to, its gonna be my first time
AgitatedDove14 , I want multiple machines to access the synced state of the optimizer. which is part of the internals of the optimizer... and then report the results back to the optimizer such that the study object of the optimizer keeps track of the results and the next sample will be aware of all previous studies
AgitatedDove14 it does, and it did, but for some reason I couldn't make it to work this way..
I require some additional imports before to infer the config path dynamically.. but even when I stripped down the code and made sure there is no other trains imports anywhere it still didn't work..
Thanks AgitatedDove14 , well if a machine doesn't set the default_output_uri, the default behavior for model checkpoints for example is to just register without uploading. So in the case that the default_output_uri is not defined the offline task folder will not have the artifacts for uploading (not included in the zip file created by offline package).. or am I missing something?
hi AgitatedDove14 , when I'm using the set_credentials approach does it mean the trains.conf is redundant? if the file doesn't exists on the machine, will it be an issue? if not, so what defaults should I assume for the rest of the values?