So should I set them all with a default value? The working dir is the project one, the one that contains the module package
Awesome AgitatedDove14 Thanks a lot π
Side note: When running src.train as a module the server gets the command as src and has to be modified to be src.train
So I would have to disconnect pytorch? And then upload the model at the end
It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...
Not yet AgitatedDove14 , does the agent use by default the python version the command is run with? I installed conda and tried using package_manager.type=conda but then get an error:clearml_agent: ERROR: 'NoneType' object has no attribute 'lower'
@<1523701070390366208:profile|CostlyOstrich36> Thanks for the help! It ended being a mistake on my side. Misconfigured the VM's memory and it had only 3.75 G. Failed when installing torch.
Iβll show you what I have through PM!
I just want to retrieve the weights on a script that tests models I have trained in the past
On the server through the command line?
Oh I think I am wrong! Then it must be the clearml monitoring. Still it fails way before the timer ends.
No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false . I have locally cloudpickle==1.4.1 but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0 . I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar
Yes, everything is that way (work dir and args are ok) except the script path . It shows -m module arg1 arg2 .
Using the get_weights(True) I get ValueError: Could not retrieve a local copy of model weights <ID>, failed downloading <URL>
I am using the code inside the on_train_epoch_end inside a metric. So the important part is:
` fig = plt.figure()
my plot
logger.experiment.add_figure("fig", fig)
plt.close() `
I am using pytorch_lightning , I'll try to create a snippet I can share! Thanks π
Yes! I will take a look at it!
Thanks SuccessfulKoala55 !
Yes AgitatedDove14 ! Iβll PM you
Thanks TimelyPenguin76 , the example works fine! Iβll debug further on my side!
I get the URL to the checkpoint/weights can I use this to download the weights?
Thanks AgitatedDove14 ! seems to be subclassed model + extension
Hey AgitatedDove14 do you have an implementation for gcloud? this is awesome
CostlyOstrich36 That seemed to do the job! No message after the first epoch, with the caveat of losing resource monitoring. Any idea of what could be causing this? If the resource monitor is the first plot then the iteration detection will fail? Are there any hacks to keep the resource monitoring? Thanks a lot! π