Badges 1119 × Eureka!
Let me work on it 👌
The plot is generated and added to tensorboard but seems clearml is not catching it.
I am using the code inside the
on_train_epoch_end inside a metric. So the important part is:
` fig = plt.figure()
No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using
detect_with_pip_freeze: false . I have locally
cloudpickle==1.4.1 but when running the code and sending the task to the node the environment uses
cloudpickle==1.6.0 . I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar
detect_with_pip_freeze: true runs into package version not found for some of the ones I have locally.
Thats really cool! But I would still prefer avoid using pip_freeze, is there a way?
Thanks TimelyPenguin76 , the example works fine! I’ll debug further on my side!
I’ll show you what I have through PM!
Yes, everything is that way (work dir and args are ok) except the script path . It shows
-m module arg1 arg2 .
Is this caused by running the script with the arguments?
Just to make sure get everything right AgitatedDove14 :
We have to define the Task inside the function decorated with the @hydra.main We can modify the parameters that are overridden on UI on : configuration tab -> Args -> overrides -> modify the listAdditional question:
Will the sweep functionality work?
AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will
task.connect_configuration work with the yaml files?
I configured a firewall rule that opened the ports for the instance (not 100% sure if this is the right way) using network tags. Yes, the whole screen is black and no trains logo show up: Safari can’t open the page because the server where this page is located isn’t responding.
SuccessfulKoala55 on both
8008 I get: Safari can’t open the page
http://<External IP>:80XX because Safari can’t establish a secure connection to the server
http://<External IP>:80XX .
Thanks SuccessfulKoala55 I’ll give it a try!
I enabled both
Also, should I allow
8008 , and
8081 on ingress and egress on GCP or is only egress enough?
AgitatedDove14 I am not sure why the packages get different versions, maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?). Should all the libraries versions match exactly between local and the code that runs in the agent? The
Task.add_requirements(package_name, package_version=None) workaround works perfectly! I just add the previous version that doesn’t break the code. Yes, definitely a force flag could help ...
I am still getting the error even with the v0.16.3 agent, is there something else we have to do other than updating it?
Hi AgitatedDove14 thanks for your reply, with the dashboard I meant the Web-App (UI) . I am trying to access
http://<External IP>:8080 but unfortunately nothing shows up.
` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...
AgitatedDove14 Downloading a dataset would not be possible using this right? I want to be able to access the data just avoid reporting the experiment results
Best thing ever, thanks AgitatedDove14 !
TimelyPenguin76 I found out its just one package that is causing the error (
cloudpickle breaks everything). Is there a way to use Pigar but force a single package to have a version?
SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using
https on the address instead of
http hence the error. Thanks a lot for your help!
Hey CostlyOstrich36 sorry to ping you! Let's say I enqueue multiple experiments on a couple of agents and one of them fails. Is it possible to restart the experiment from the UI using the latest checkpoint? What if the experiment gets assigned to the other agent? I am not sure how the
continue_last_task flag would help in this case.
task.set_archived(True) + the cleanup service should do it 👌 If we run in debug mode the experiment goes directly to the archive and gets cleaned and we don’t pollute the main experiment page.