Reputation
Badges 1
119 × Eureka!Hey AgitatedDove14 after playing around seems that if the callback filepath points to an hdf5 file it is not uploaded.
Pigar is capturing different versions that the ones I have installed on my local machine (not a problem except for one). I just want to force the version of that package in a way that I don’t have to manually change it from the UI for every experiment.
Yes, exactly! Unfortunately I am not so familiar with the internals of the library but I could take a look and figure that out.
With pip I get the first error I showed, I tried conda and it starts running but at some point crashes with:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
Makes sense! Then where would I have to add output_uri to save the weights?
I configured a firewall rule that opened the ports for the instance (not 100% sure if this is the right way) using network tags. Yes, the whole screen is black and no trains logo show up: Safari can’t open the page because the server where this page is located isn’t responding.
AgitatedDove14 I filed an issue of fire for them to point us to the argument parsing method https://github.com/google/python-fire/issues/291
AgitatedDove14 Well I have a loss function which is something like:class MyLoss(...): def forward(...): weights = self.compute_weights(...) return (weights * (target-preds)).mean()There seems to be a problem on certain batch when computing the weights. What would be the best way to log the batch that causes the problem, along with the weights being computed.
It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache only a cu120 empty folder appears. Should the torch wheel be saved there?
Thanks AgitatedDove14 !
Is this caused by running the script with the arguments?
I'll give that a try! Thanks CostlyOstrich36
Also, should I allow 8080 , 8008 , and 8081 on ingress and egress on GCP or is only egress enough?
` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...
Works like a charm 👌 thanks!
SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https on the address instead of http hence the error. Thanks a lot for your help!
I set it to 200000 ! But the problem stems from when the first plot is the clearml cpu and gpu monitoring, were you able to reproduce it? Even if I set the number fairly large when the monitoring plot was reported the message appeared.