
Reputation
Badges 1
119 × Eureka!Hey AgitatedDove14 after playing around seems that if the callback filepath points to an hdf5 file it is not uploaded.
I am about to try everything AgitatedDove14 but ran into a gitlab error from the agent, I added the username and password to the configuration file but still get a Host key verification failed
. Is it common that the cloning message shows the SSH
link instead of the HTTPS
when username and password are provided?
On the server through the command line?
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
Thanks TimelyPenguin76 , the example works fine! Iβll debug further on my side!
Makes sense! Then where would I have to add output_uri
to save the weights?
I have the agent configured to force install requirements.txt
Yes Martin! I have a package installed from github but its using the pypi version
So I would have to disconnect pytorch? And then upload the model at the end
Also, should I allow 8080
, 8008
, and 8081
on ingress and egress on GCP or is only egress enough?
It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache
only a cu120
empty folder appears. Should the torch wheel be saved there?
AgitatedDove14 update here! Something like this should work:from trains import StorageManager from trains.storage.helper import StorageHelper bucket = 'gs://bucket' helper = StorageHelper.get(bucket) remote_files = helper.list('folder') for f in remote_files: StorageManager.get_local_copy(bucket + "/" + f)
the *
gives []
results since one the list
method startswith
is used which uses it as a string and not as a wildcard
AgitatedDove14 task.set_archived(True)
+ the cleanup service should do it π If we run in debug mode the experiment goes directly to the archive and gets cleaned and we donβt pollute the main experiment page.
I get the URL to the checkpoint/weights
can I use this to download the weights?
Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda
but then get an error:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
AgitatedDove14 I filed an issue of fire for them to point us to the argument parsing method https://github.com/google/python-fire/issues/291
Sure! Could you point me out how its done
Yes! I think thats what I will do π Let me know if there is a way to contribute a mode to keep logging off. We just donβt want to pollute the server when debugging.
If you try:ModelCheckpoint('best_model.hdf5', save_best_only=True)
does it work too?
Thanks AgitatedDove14 ! seems to be subclassed model + extension
AgitatedDove14 Downloading a dataset would not be possible using this right? I want to be able to access the data just avoid reporting the experiment results
It works perfectly! AgitatedDove14 There is something weird on my side π’