Reputation
Badges 1
119 × Eureka!I'll give that a try! Thanks CostlyOstrich36
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda
but then get an error:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
I have the agent configured to force install requirements.txt
With pip
I get the first error I showed, I tried conda
and it starts running but at some point crashes with:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
I’ll open the PR!
Sure! I enqueue the experiment from my local machine:python -m src.train model=my_model loss=my_loss dataset=my_dataset
Then I go to the server and run the experiment and create a copy to run with a new model. On the copy, I go to the script path
and modify it to be:-m src.train model=my_other_model loss=my_loss dataset=my_dataset
The new experiment, even though the script path
has my_new_model
default, starts training using my_model
.
I can also see ...
Side note: When running src.train
as a module the server gets the command as src
and has to be modified to be src.train
Thats really cool! But I would still prefer avoid using pip_freeze, is there a way?
Pigar is capturing different versions that the ones I have installed on my local machine (not a problem except for one). I just want to force the version of that package in a way that I don’t have to manually change it from the UI for every experiment.
Yes, it’s similar; somewhat more automatic since it detects the classes of functions arguments and generates the CLI. What do you mean by that AgitatedDove14 get all the parameters and use task.connect
?
Yes, exactly! Unfortunately I am not so familiar with the internals of the library but I could take a look and figure that out.
Sure, I’ll share It through a private message!
AgitatedDove14 task.set_archived(True)
+ the cleanup service should do it 👌 If we run in debug mode the experiment goes directly to the archive and gets cleaned and we don’t pollute the main experiment page.
I feel it’s easier not to report than cleaning after but please correct me if I am overthinking it. I’ll check if I could wrap the code in something that calls the Task.delete if debugging
Yes! I think thats what I will do 👌 Let me know if there is a way to contribute a mode to keep logging off. We just don’t want to pollute the server when debugging.
Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.
Managed to get:
clearml_agent: ERROR: Command '['/home/ramon/.clearml/venvs-builds/3.9/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/var/tmp/requirements_tb0x2i3j.txt', '--extra-index-url', '
died with <Signals.SIGKILL: 9>.
while building the task with the id on the agent
AgitatedDove14 I filed an issue of fire for them to point us to the argument parsing method https://github.com/google/python-fire/issues/291
I am about to try everything AgitatedDove14 but ran into a gitlab error from the agent, I added the username and password to the configuration file but still get a Host key verification failed
. Is it common that the cloning message shows the SSH
link instead of the HTTPS
when username and password are provided?
Yes Martin! I have a package installed from github but its using the pypi version
If you try:ModelCheckpoint('best_model.hdf5', save_best_only=True)
does it work too?