
Reputation
Badges 1
119 × Eureka!I am using the code inside the on_train_epoch_end
inside a metric. So the important part is:
` fig = plt.figure()
my plot
logger.experiment.add_figure("fig", fig)
plt.close() `
Works like a charm 👌 thanks!
No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false
. I have locally cloudpickle==1.4.1
but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0
. I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar
Using detect_with_pip_freeze: true
runs into package version not found for some of the ones I have locally.
Thats really cool! But I would still prefer avoid using pip_freeze, is there a way?
Thanks TimelyPenguin76 , the example works fine! I’ll debug further on my side!
I’ll show you what I have through PM!
Yes, everything is that way (work dir and args are ok) except the script path . It shows -m module arg1 arg2
.
Is this caused by running the script with the arguments?
Just to make sure get everything right AgitatedDove14 :
We have to define the Task inside the function decorated with the @hydra.main We can modify the parameters that are overridden on UI on : configuration tab -> Args -> overrides -> modify the listAdditional question:
Will the sweep functionality work?
AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration
work with the yaml files?
I configured a firewall rule that opened the ports for the instance (not 100% sure if this is the right way) using network tags. Yes, the whole screen is black and no trains logo show up: Safari can’t open the page because the server where this page is located isn’t responding.
SuccessfulKoala55 on both 8080
and 8008
I get: Safari can’t open the page http://<External IP>:80XX
because Safari can’t establish a secure connection to the server http://<External IP>:80XX
.
Also, should I allow 8080
, 8008
, and 8081
on ingress and egress on GCP or is only egress enough?
AgitatedDove14 I am not sure why the packages get different versions, maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?). Should all the libraries versions match exactly between local and the code that runs in the agent? The Task.add_requirements(package_name, package_version=None)
workaround works perfectly! I just add the previous version that doesn’t break the code. Yes, definitely a force flag could help ...
I am still getting the error even with the v0.16.3 agent, is there something else we have to do other than updating it?
Hi AgitatedDove14 thanks for your reply, with the dashboard I meant the Web-App (UI) . I am trying to access http://<External IP>:8080
but unfortunately nothing shows up.
` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...
AgitatedDove14 Downloading a dataset would not be possible using this right? I want to be able to access the data just avoid reporting the experiment results
Best thing ever, thanks AgitatedDove14 !
TimelyPenguin76 I found out its just one package that is causing the error ( cloudpickle
breaks everything). Is there a way to use Pigar but force a single package to have a version?
SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https
on the address instead of http
hence the error. Thanks a lot for your help!
Hey CostlyOstrich36 sorry to ping you! Let's say I enqueue multiple experiments on a couple of agents and one of them fails. Is it possible to restart the experiment from the UI using the latest checkpoint? What if the experiment gets assigned to the other agent? I am not sure how the continue_last_task
flag would help in this case.
AgitatedDove14 task.set_archived(True)
+ the cleanup service should do it 👌 If we run in debug mode the experiment goes directly to the archive and gets cleaned and we don’t pollute the main experiment page.