@<1523701435869433856:profile|SmugDolphin23> , I’ve updated both the ClearML server and client to the latest version, 1.14.0, as per our previous conversation. However, I’m still encountering the same issue as described earlier.
WebApp: 1.14.0-431
Server: 1.14.0-431
API: 2.28
I attempted to use the same queue for both the controller and the steps, and assigned two workers to this queue. Upon executing the code, the pipeline was initiated on the “queue_remote” queue, and the tasks of the pipeline were also initiated on another worker in the “queue_remote” queue. However, the dataset creation failed because it was unable to locate the Python modules from the current directory as shown in the below screenshot.
Note: I stored the code and its dependencies in a GitHub repository when I executed the pipeline.
Please refer to the attached error screenshot and the code I used to run the pipeline for more details
Hi!
It is possible to use the same queue for the controller and the steps, but there needs to be at least 2 agents that pull tasks from that queue. Otherwise, if there is only 1 agent, then that agent will be busy running the controller and it won't be able to fetch the steps.
Regarding missing local packages: the step is ran in a temporary directory that is different than the directory the script is originally in. To solve this, you could add all the modules/files you are interested in in a git repository. If you do, that repository will be cloned by the agent when running the steps, which will make the packages accessible.
@<1523701435869433856:profile|SmugDolphin23> I have tried the same method as suggested by you and the pipeline still failed, as it couldn't find "modules". Could you please help me here?
I would like to describe the process again, which I was following:
- I created a queue and assigned 2 workers to the queue.
- In the pipeline.py file, to start the pipeline I used
pipe.start(queue="queue_remote")
and for the tasks I usedpipe.set_default_execution_queue('queue_remote')
- In the
working_dir = ev_xxxx_xxtion/clearml
I executed the code usingpython3 pipeline.py
- The pipeline was initiated on queue "
queue_remote
" on worker 01 & the next tasks were initiated on queue "queue_remote
" on worker 02 and it failed, as it couldn't find the modules in worker 02.
When I run it from command line everything return back to normal and pipeline is visible for now. Thank you very much for your helps, time and feedbacks 🙂 @<1523701435869433856:profile|SmugDolphin23>
sure, I'll add those details & check. Thank you
@<1523701435869433856:profile|SmugDolphin23> I have attached two screenshots, One is pipeline initialization & other one is the task of the pipeline.
The project's directory is as follows:
The pipeline.py includes the code to run the pipeline & tasks of the pipeline.
├── Makefile
├── README.md
├── ev_xxxxxx_detection
│ ├── __init__.py
│ ├── __pycache__
│ │ └── __init__.cpython-311.pyc
│ ├── clearml
│ │ ├── __pycache__
│ │ ├── clearml_wrapper.py
│ │ ├── constants.py
│ │ ├── data_loader.py
│ │ ├── ev_trainer.py
│ │ ├── pipeline.py
│ │ └── util.py
├── poetry.lock
├── pyproject.toml
@<1626028578648887296:profile|FreshFly37> can you please screenshot this section of the task? Also, how does your project's directory structure look like?