
Reputation
Badges 1
981 × Eureka!but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip
Exactly, thatโs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
I think that since the agent installs everything from scratch it should work for you. Wdyt?
With env caching enabled, it wonโt reinstall this private dependency, right?
Yes I agree, but I get a strange error when using dataloaders:RuntimeError: [enforce fail at context_gpu.cu:323] error == cudaSuccess. 3 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:323: initialization error
only when I use num_workers > 0
I actually need to be able to overwrite files, so in my case it makes sense to give the Deleteobject permission in s3. But for other cases, why not simply catch this error, display a warning to the user and store internally that delete is not possible?
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
agent.package_manager.type = pip ... Using base prefix '/home/machine1/miniconda3/envs/py36' New python executable in /home/machine1/.trains/venvs-builds/3.6/bin/python3.6 Also creating executable in /home/machine1/.trains/venvs-builds/3.6/bin/python Installing setuptools, pip, wheel...
CostlyOstrich36 super thanks for confirming! I have then the follow-up question: are the artifacts duplicated (copied)? or just referenced?
I understand, but then why the docker mode is an option of the CLI if we always have to use it so that it works?
Sorry, its actuallytask.update_requirements(["."])ย
yes, because it wonโt install the local package which has this setup.py with the problem in its install_requires described in my previous message
That's why I suspected trains was installing a different version that the one I expected
the deep learning AMI from nvidia (Ubuntu 18.04)
Usually one or two tags, indeed, task ids are not so convenient, but only because they are not displayed in the page, so I have to go back to another page to check the ID of each experiment. Maybe just showing the ID of each experiment in the SCALAR page would already be great, wdyt?
I will try with that and keep you updated
I think waiting for the apt locks to be released with something like this would workstartup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...
Weirdly this throws an error in the autoscaler:
` Spinning new instance type=v100_spot
Error: Failed to start new instance, unexpected '{' in field...
Sorry, I didn't get that ๐
yes, that's also what I thought
Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated
From the answers I saw on the internet, it is most likely related to the mismatch of cuda/cudnn version
hoo thats cool! I could place torch==1.3.1 there
Does the agent install the nvidia-container toolkit, so that GPUs of the instance can be accessed from inside the docker running jupyterlab?
I am sorry to give infos that are not very precise, but itโs the best I can do - Is this bug happening only to me?
What happens is different error but it was so weird that I thought it was related to the version installed
No, they have different names - I will try to update both agents to the latest versions