It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache only a cu120 empty folder appears. Should the torch wheel be saved there?
Is this caused by running the script with the arguments?
I'll give that a try! Thanks CostlyOstrich36
Also, should I allow 8080 , 8008 , and 8081 on ingress and egress on GCP or is only egress enough?
` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...
Works like a charm 👌 thanks!
SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https on the address instead of http hence the error. Thanks a lot for your help!
I set it to 200000 ! But the problem stems from when the first plot is the clearml cpu and gpu monitoring, were you able to reproduce it? Even if I set the number fairly large when the monitoring plot was reported the message appeared.
AgitatedDove14 Downloading a dataset would not be possible using this right? I want to be able to access the data just avoid reporting the experiment results
It works perfectly! AgitatedDove14 There is something weird on my side 😢
` [package_manager.force_repo_requirements_txt=true] Skipping requirements, using repository "requirements.txt"
Using base prefix '/opt/conda'
New python executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python3.7
Also creating executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python
Installing setuptools, pip, wheel...
2021-06-10 09:57:56
done.
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: p...
Sure! I enqueue the experiment from my local machine:python -m src.train model=my_model loss=my_loss dataset=my_dataset
Then I go to the server and run the experiment and create a copy to run with a new model. On the copy, I go to the script path and modify it to be:-m src.train model=my_other_model loss=my_loss dataset=my_dataset
The new experiment, even though the script path has my_new_model default, starts training using my_model .
I can also see ...
My bad :man-facepalming: It was just specifying weights_path=dirpath since the first argument is weights_filename
For option 2 do I have to configure it on all agents or on the server?
Sure! For torch I have:
torch==2.0.1
# via
# monai
# pytorch-lightning
# torchio
# torchmetrics
Yes Martin! I have a package installed from github but its using the pypi version
Sure! Could you point me out how its done