Reputation
Badges 1
83 × Eureka!inside the containers that are spinning on the host machine
Still giving me the same error
If you can let me know @<1576381444509405184:profile|ManiacalLizard2> @<1523701087100473344:profile|SuccessfulKoala55> how to resolve this, that would be very much helpful
Just a follow up on this issue, @<1523701087100473344:profile|SuccessfulKoala55> @<1523701205467926528:profile|AgitatedDove14> I would very much appreciate it if you could help me with this.
Hey, so I am able to spin up the GCP instance using the autoscaler, I wanted to confirm one thing does the autoscaler spins up the agent automatically in the VM or do I need to add the script for that to the bash script
I did provide the credentials, and also I am running up the autoscaler for the first time, so no it hasn't worked before
Also @<1523701087100473344:profile|SuccessfulKoala55> when autoscaler spins up my GCP instance, when I look inside it I am not able to find the clearml.conf file, does it not install clearml automatically when it spins up the VM?
Also I was facing another issue, the task is not able to clone the github repo, it's showing authentication error even though I have passed my git credentials
And one more thing is there a way to make changes to the .bashrc which is present inside the docker container
Let me know if this is enough information or not
Well the VM is running in the default docker nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04, but it's not spinning up the agent when the VM is intialized
I don't think it has issues with this
Can you explain how running two agents would help me run the whole pipeline remotely? Sorry if its a very basic question
The issue I am facing is when i do get_local_copy() the dataset(used for tarining yolov8) is downloaded inside the clearml cache (my image dataset contains images, labels, .txt files which has path to the images and a .yaml file). The downloaded .txt files shows that the image files are downloaded in the git repo present inside the clearml venvs, but actually that path doesn't exist and it is giving me an error
Ok, it's cloning but it's asking for my github credentials
so you mean when i ssh into my VM i need to do a git clone and then spin up the agent, right?
My git repo only contains the hash-ids which are used to download the dataset into my local machine
So I should clone the pipeline, run the agent and then enqueue the cloned pipeline?
Is there a way to clone the whole pipeline, just like we clone tasks
One more thing in my git repo there is a dataset folder that contains hash-ids, these hash-ids are used to download the dataset. When I am running the pipeline remotely the files/images are downloaded in the cloned git repo inside the .clearml/venvs but when I check inside that venvs folder there are not images present.
I have a pipeline which I am able to run locally, the pipeline has a pipeline controller along with 4 tasks, download data, training, testing and predict. How do I run execute this whole pipeline remotely so that each task is executed sequentially?
yes same env for all the components
when I am running the pipeline remotely, I am getting the following error message
There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Is there a way to change the path inside the .txt file to clearml cache, because my images are stored in clearml cache only
When the package installation is done in the task
so my model is not able to find the dataset
So I am running a pipeline on a GCP VM, my VM has 1 NVIDIA GPU, and my requirements.txt has torch==1.13.1+cu117
torchvision==0.14.1+cu117
When I am running the Yolo training step I am getting the above error.