Reputation
Badges 1
119 × Eureka!TimelyPenguin76 I found out its just one package that is causing the error ( cloudpickle
breaks everything). Is there a way to use Pigar but force a single package to have a version?
No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false
. I have locally cloudpickle==1.4.1
but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0
. I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar
Pigar is capturing different versions that the ones I have installed on my local machine (not a problem except for one). I just want to force the version of that package in a way that I don’t have to manually change it from the UI for every experiment.
Works like a charm 👌 thanks!
Thats really cool! But I would still prefer avoid using pip_freeze, is there a way?
Using detect_with_pip_freeze: true
runs into package version not found for some of the ones I have locally.
AgitatedDove14 I am not sure why the packages get different versions, maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?). Should all the libraries versions match exactly between local and the code that runs in the agent? The Task.add_requirements(package_name, package_version=None)
workaround works perfectly! I just add the previous version that doesn’t break the code. Yes, definitely a force flag could help ...
Yes AgitatedDove14 ! I’ll PM you
@<1523701070390366208:profile|CostlyOstrich36> Thanks for the help! It ended being a mistake on my side. Misconfigured the VM's memory and it had only 3.75 G. Failed when installing torch.
Hey CostlyOstrich36 sorry to ping you! Let's say I enqueue multiple experiments on a couple of agents and one of them fails. Is it possible to restart the experiment from the UI using the latest checkpoint? What if the experiment gets assigned to the other agent? I am not sure how the continue_last_task
flag would help in this case.
Sure, I’ll share It through a private message!
I am about to try everything AgitatedDove14 but ran into a gitlab error from the agent, I added the username and password to the configuration file but still get a Host key verification failed
. Is it common that the cloning message shows the SSH
link instead of the HTTPS
when username and password are provided?
Yes, it’s similar; somewhat more automatic since it detects the classes of functions arguments and generates the CLI. What do you mean by that AgitatedDove14 get all the parameters and use task.connect
?
Sure! I enqueue the experiment from my local machine:python -m src.train model=my_model loss=my_loss dataset=my_dataset
Then I go to the server and run the experiment and create a copy to run with a new model. On the copy, I go to the script path
and modify it to be:-m src.train model=my_other_model loss=my_loss dataset=my_dataset
The new experiment, even though the script path
has my_new_model
default, starts training using my_model
.
I can also see ...
` [package_manager.force_repo_requirements_txt=true] Skipping requirements, using repository "requirements.txt"
Using base prefix '/opt/conda'
New python executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python3.7
Also creating executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python
Installing setuptools, pip, wheel...
2021-06-10 09:57:56
done.
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: p...
Yes AgitatedDove14 , I added git user name and password on the trains.conf file. On the results tab of the UI the logs clone command shows the SSH
command instead of the HTTPS
:Repository cloning failed: Command ['clone',
mailto:'git@gitlab.com : ...
AgitatedDove14 I filed an issue of fire for them to point us to the argument parsing method https://github.com/google/python-fire/issues/291
Yes, exactly! Unfortunately I am not so familiar with the internals of the library but I could take a look and figure that out.
I need to fetch a dataset for some simple tests but since it doesn’t have credentials to the self-hosted server it wont find the dataset
I configured a firewall rule that opened the ports for the instance (not 100% sure if this is the right way) using network tags. Yes, the whole screen is black and no trains logo show up: Safari can’t open the page because the server where this page is located isn’t responding.
Hi AgitatedDove14 thanks for your reply, with the dashboard I meant the Web-App (UI) . I am trying to access http://<External IP>:8080
but unfortunately nothing shows up.
Also, should I allow 8080
, 8008
, and 8081
on ingress and egress on GCP or is only egress enough?
SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https
on the address instead of http
hence the error. Thanks a lot for your help!
SuccessfulKoala55 on both 8080
and 8008
I get: Safari can’t open the page http://<External IP>:80XX
because Safari can’t establish a secure connection to the server http://<External IP>:80XX
.