
Reputation
Badges 1
19 × Eureka!when I set-up two queues and two workers, set the default-execution-queue to one queue and use the other queue for pipe.start, it all works
but if I run exactly the same code from a python script (which also calls start on te pipeline), the worker node tries to check out the script and runs that (or fails if you didn't check it into git yet)
I created the pipeline on another machine via interactive python shell. The pipeline is picked up by clearml, as I see it on the web ui.
in case of the local jupyter notebook, I create the pipeline and when I start it, it all works without the necessity to add the jupyter notebook to git
if I look at the code of the clearml controller.py, I see that it expects additional code at a relative folder
but the behavior is different if you kick it off from a jupyter notebook (local) or a python script
the error occurs in the worker node when it tries to initialize the environment for the pipeline
I do not get more information than I just showed
however, I did notice another issue.
my worker node is not a docker, but linux in conda environment
(the 'remo2' task is an existing experiment)
looks like it's missing some dependencies
Full console log of the worker:
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue b5fe1e72614247f7a77e5f6cdac35580
No tasks in Queues, sleeping for 5.0 seconds
task 30ad27a7a1244b6e8aa722d81cb6015c pulled from b5fe1e72614247f7a77e5f6cdac35580 by worker NLEIN-315GNH2:0
Running task '30ad27a7a1244b6e8aa722d81cb6015c'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.sppvun4p.txt', '/tmp/.clearml_agent_out.sppvun4p.txt'
Current configuration (clearml_agent v1.4.1, location:...
if I go to the folder as mentioned in the error and than one level up, I see no other packages present
The notebook behavior is indeed how I expect it to work, the behavior via the script is strange
FYI: this is my pipeline script
from clearml import PipelineController
pipe = PipelineController(name="My Pipe", project="Gridsquare-Training", version="0.0.5")
pipe.add_step(name="pipe step 1", base_task_project="Gridsquare-Training", base_task_name="remo2")
pipe.add_step(name="pipe step 2", base_task_project="Gridsquare-Training", base_task_name="remo2", parents=["pipe step 1"])
pipe.set_default_execution_queue("myqueue")
pipe.start("service")
Initially, I had only one queue and one worker set-up. If the pipeline 'default execution queue' is the same as the queue used in pipe.start('the queue'), it gets into sort of a dead-lock and waits forever
Hi John, I've done more experiments and found that this only happens if you try to run the pipeline remotely directly from the python interpreter