
Reputation
Badges 1
119 × Eureka!I’ll open the PR!
Works like a charm 👌 thanks!
Just to make sure get everything right AgitatedDove14 :
We have to define the Task inside the function decorated with the @hydra.main We can modify the parameters that are overridden on UI on : configuration tab -> Args -> overrides -> modify the listAdditional question:
Will the sweep functionality work?
Hey AgitatedDove14 after playing around seems that if the callback filepath points to an hdf5 file it is not uploaded.
This works:filepath = self.log_dir + os.sep + "checkpoint" self.callbacks.append( ModelCheckpoint( filepath, monitor="val_loss", mode="min", save_best_only=True, save_weights_only=True, ) )
And this doesn’t:
` filepath = self.log_dir + os.sep + "checkpoint.hdf5"
self.callbacks.append(
ModelCheckpoint(
filepath,
...
` [package_manager.force_repo_requirements_txt=true] Skipping requirements, using repository "requirements.txt"
Using base prefix '/opt/conda'
New python executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python3.7
Also creating executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python
Installing setuptools, pip, wheel...
2021-06-10 09:57:56
done.
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: p...
If you try:ModelCheckpoint('best_model.hdf5', save_best_only=True)
does it work too?
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
Thanks TimelyPenguin76 , the example works fine! I’ll debug further on my side!
I have the agent configured to force install requirements.txt
It is failing exactly when the download finishes. Not sure if it is something but on the ~/.clearml/pip-download-cache
only a cu120
empty folder appears. Should the torch wheel be saved there?
Basically one points to an hdf5 and the other one has no extensiion
Yes AgitatedDove14 , I am not sure what they use by default. Here is a simple working example:
` from typing import Optional
import torch
from clearml import Task
from pytorch_lightning import LightningDataModule, LightningModule
from pytorch_lightning.utilities.cli import LightningCLI
from torch.utils.data import DataLoader, Dataset, Subset
class RandomDataset(Dataset):
def init(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def ...
AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration
work with the yaml files?
AgitatedDove14 update here! Something like this should work:from trains import StorageManager from trains.storage.helper import StorageHelper bucket = 'gs://bucket' helper = StorageHelper.get(bucket) remote_files = helper.list('folder') for f in remote_files: StorageManager.get_local_copy(bucket + "/" + f)
the *
gives []
results since one the list
method startswith
is used which uses it as a string and not as a wildcard
I just want to retrieve the weights on a script that tests models I have trained in the past
Best thing ever, thanks AgitatedDove14 !
I get the URL to the checkpoint/weights
can I use this to download the weights?
Using the get_weights(True)
I get ValueError: Could not retrieve a local copy of model weights <ID>, failed downloading <URL>
Managed to get:
clearml_agent: ERROR: Command '['/home/ramon/.clearml/venvs-builds/3.9/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/var/tmp/requirements_tb0x2i3j.txt', '--extra-index-url', '
died with <Signals.SIGKILL: 9>.
while building the task with the id on the agent
SuccessfulKoala55 on both 8080
and 8008
I get: Safari can’t open the page http://<External IP>:80XX
because Safari can’t establish a secure connection to the server http://<External IP>:80XX
.
Makes sense! Then where would I have to add output_uri
to save the weights?