` [package_manager.force_repo_requirements_txt=true] Skipping requirements, using repository "requirements.txt"
Using base prefix '/opt/conda'
New python executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python3.7
Also creating executable in /home/ramon/.clearml/venvs-builds/3.7/bin/python
Installing setuptools, pip, wheel...
2021-06-10 09:57:56
done.
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: p...
Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda
but then get an error:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
With pip
I get the first error I showed, I tried conda
and it starts running but at some point crashes with:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
Yes AgitatedDove14 , I am not sure what they use by default. Here is a simple working example:
` from typing import Optional
import torch
from clearml import Task
from pytorch_lightning import LightningDataModule, LightningModule
from pytorch_lightning.utilities.cli import LightningCLI
from torch.utils.data import DataLoader, Dataset, Subset
class RandomDataset(Dataset):
def init(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def ...
Works like a charm 👌 thanks!
Sure! I enqueue the experiment from my local machine:python -m src.train model=my_model loss=my_loss dataset=my_dataset
Then I go to the server and run the experiment and create a copy to run with a new model. On the copy, I go to the script path
and modify it to be:-m src.train model=my_other_model loss=my_loss dataset=my_dataset
The new experiment, even though the script path
has my_new_model
default, starts training using my_model
.
I can also see ...
AgitatedDove14 Thanks! Im trying to figure out how to create a minimum working example! I am also working with Hydra so that may be a thing. The extension is whats causing it to fail (haven’t figured out why).
AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration
work with the yaml files?
Hey AgitatedDove14 does this work for you?
` from argparse import ArgumentParser
from tensorflow.keras import utils as np_utils
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
import tensorflow as tf
from clearml import Task
class Linear(tf.keras.Model):
def init(self, in_shape=(784,), num_classes=10):
super().init()
self.l...
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
AgitatedDove14 update here! Something like this should work:from trains import StorageManager from trains.storage.helper import StorageHelper bucket = 'gs://bucket' helper = StorageHelper.get(bucket) remote_files = helper.list('folder') for f in remote_files: StorageManager.get_local_copy(bucket + "/" + f)
the *
gives []
results since one the list
method startswith
is used which uses it as a string and not as a wildcard
I am still getting the error even with the v0.16.3 agent, is there something else we have to do other than updating it?
I am about to try everything AgitatedDove14 but ran into a gitlab error from the agent, I added the username and password to the configuration file but still get a Host key verification failed
. Is it common that the cloning message shows the SSH
link instead of the HTTPS
when username and password are provided?
Best thing ever, thanks AgitatedDove14 !
Yes, it’s similar; somewhat more automatic since it detects the classes of functions arguments and generates the CLI. What do you mean by that AgitatedDove14 get all the parameters and use task.connect
?
Side note: When running src.train
as a module the server gets the command as src
and has to be modified to be src.train
Yes AgitatedDove14 , I added git user name and password on the trains.conf file. On the results tab of the UI the logs clone command shows the SSH
command instead of the HTTPS
:Repository cloning failed: Command ['clone',
mailto:'git@gitlab.com : ...
Just to make sure get everything right AgitatedDove14 :
We have to define the Task inside the function decorated with the @hydra.main We can modify the parameters that are overridden on UI on : configuration tab -> Args -> overrides -> modify the listAdditional question:
Will the sweep functionality work?
If you try:ModelCheckpoint('best_model.hdf5', save_best_only=True)
does it work too?
SuccessfulKoala55 on both 8080
and 8008
I get: Safari can’t open the page http://<External IP>:80XX
because Safari can’t establish a secure connection to the server http://<External IP>:80XX
.
Awesome AgitatedDove14 Thanks a lot 🙌
Using detect_with_pip_freeze: true
runs into package version not found for some of the ones I have locally.