Reputation
Badges 1
119 × Eureka!So should I set them all with a default value? The working dir is the project one, the one that contains the module package
Awesome AgitatedDove14 Thanks a lot π
Side note: When running src.train as a module the server gets the command as src and has to be modified to be src.train
So I would have to disconnect pytorch? And then upload the model at the end
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda but then get an error:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
@<1523701070390366208:profile|CostlyOstrich36> Thanks for the help! It ended being a mistake on my side. Misconfigured the VM's memory and it had only 3.75 G. Failed when installing torch.
Hi CostlyOstrich36 ! The message is the following:clearml.model - INFO - Selected model id: 27c1a1700b0b4e25a4344dc4ef9868faThey are not models, those are intermediate tensors I am caching to make training faster. I don't need to log them.
Iβll show you what I have through PM!
I just want to retrieve the weights on a script that tests models I have trained in the past
On the server through the command line?
Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.
No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false . I have locally cloudpickle==1.4.1 but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0 . I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar
Yes, everything is that way (work dir and args are ok) except the script path . It shows -m module arg1 arg2 .
Using the get_weights(True) I get ValueError: Could not retrieve a local copy of model weights <ID>, failed downloading <URL>
I am using the code inside the on_train_epoch_end inside a metric. So the important part is:
` fig = plt.figure()
my plot
logger.experiment.add_figure("fig", fig)
plt.close() `
AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration work with the yaml files?
I am using pytorch_lightning , I'll try to create a snippet I can share! Thanks π
Yes! I will take a look at it!
TimelyPenguin76 I found out its just one package that is causing the error ( cloudpickle breaks everything). Is there a way to use Pigar but force a single package to have a version?
Thanks SuccessfulKoala55 !
Yes AgitatedDove14 ! Iβll PM you
Thanks TimelyPenguin76 , the example works fine! Iβll debug further on my side!
I get the URL to the checkpoint/weights can I use this to download the weights?
Thanks AgitatedDove14 ! seems to be subclassed model + extension