One more thing in my git repo there is a dataset folder that contains hash-ids, these hash-ids are used to download the dataset. When I am running the pipeline remotely the files/images are downloaded in the cloned git repo inside the .clearml/venvs but when I check inside that venvs folder there are not images present.
Is there a way to change the path inside the .txt file to clearml cache, because my images are stored in clearml cache only
I am uploading the dataset (for Yolov8 training) as an artifact, when I am downloading the artifact (.zip file) from the UI the path to images is something like /Users/adityachaudhry/.clearml/cache/......, but when I am doing .get_local_copy() I am getting the local folder structure where I have my images locally in my system as path. For running the pipeline remotely I want the path to be like /Users/adityachaudhry/.clearml/cache/......
Is there a way to clone the whole pipeline, just like we clone tasks
Hi @<1610083503607648256:profile|DiminutiveToad80>
You mean the pipeline logic? It should autodetect the imports of the logic function (like any Task.init call)
You can however call Task.force_requirements_env_freeze
and pass a local requiremenst.txt
Make sure to call it before create the Pipeline object
None
So I should clone the pipeline, run the agent and then enqueue the cloned pipeline?
I think I'm missing the connection between the hash-ids and the txt file, or in other words why is the txt file containing full path not relative path
correct. notice you need two gents one for the pipeline (logic) and one for the pipeline components.
that said you can run two agents on the same machine 🙂
Yeah you can ignore those, this is some python GC stuff, seems to be related with the OS and python version
My git repo only contains the hash-ids which are used to download the dataset into my local machine
I have a pipeline which I am able to run locally, the pipeline has a pipeline controller along with 4 tasks, download data, training, testing and predict. How do I run execute this whole pipeline remotely so that each task is executed sequentially?
So for my project I have a dataset present in my local system, when I am running the pipeline remotely is there a way the remote machine can access it?
Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''
when I am running the pipeline remotely, I am getting the following error message
There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Run clearml-agent and enqueue the pipeline ? What am i missing?
Can you explain how running two agents would help me run the whole pipeline remotely? Sorry if its a very basic question
The issue I am facing is when i do get_local_copy() the dataset(used for tarining yolov8) is downloaded inside the clearml cache (my image dataset contains images, labels, .txt files which has path to the images and a .yaml file). The downloaded .txt files shows that the image files are downloaded in the git repo present inside the clearml venvs, but actually that path doesn't exist and it is giving me an error
but actually that path doesn't exist and it is giving me an error
So you are saying you only uploaded the "meta-data" i.e. a text file with links to the files, and this is why it is missing?
Is there a way to change the path inside the .txt file to clearml cache, because my images are stored in clearml cache only
I think a good solution would be to store the path in the txt file as relative path, i.e. instead of /Users/adityachaudhry/data/folder... as ./data/folder
I want to understand what's happening at the backend. I want to know how running the pipeline logic and the tasks on separate agents gonna sync everything up
For running the pipeline remotely I want the path to be like /Users/adityachaudhry/.clearml/cache/......
I'm not sure I follow, if you are getting a path with all your folders from get_local_copy , that's exactly what you are looking for, no?
, when I am running the pipeline remotely is there a way the remote machine can access it?
Well for the dataset to be accessible, you need to upload it with Dataset class, then the remote machine can do Dataset.get(...).get_local_copy() to get the actual data on the remote machine