Reputation
Badges 1
19 × Eureka!I created the pipeline on another machine via interactive python shell. The pipeline is picked up by clearml, as I see it on the web ui.
Initially, I had only one queue and one worker set-up. If the pipeline 'default execution queue' is the same as the queue used in pipe.start('the queue'), it gets into sort of a dead-lock and waits forever
when I set-up two queues and two workers, set the default-execution-queue to one queue and use the other queue for pipe.start, it all works
if I go to the folder as mentioned in the error and than one level up, I see no other packages present
Full console log of the worker:
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue b5fe1e72614247f7a77e5f6cdac35580
No tasks in Queues, sleeping for 5.0 seconds
task 30ad27a7a1244b6e8aa722d81cb6015c pulled from b5fe1e72614247f7a77e5f6cdac35580 by worker NLEIN-315GNH2:0
Running task '30ad27a7a1244b6e8aa722d81cb6015c'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.sppvun4p.txt', '/tmp/.clearml_agent_out.sppvun4p.txt'
Current configuration (clearml_agent v1.4.1, location:...
The notebook behavior is indeed how I expect it to work, the behavior via the script is strange
however, I did notice another issue.
but the behavior is different if you kick it off from a jupyter notebook (local) or a python script
I do not get more information than I just showed
(the 'remo2' task is an existing experiment)
looks like it's missing some dependencies
the error occurs in the worker node when it tries to initialize the environment for the pipeline
but if I run exactly the same code from a python script (which also calls start on te pipeline), the worker node tries to check out the script and runs that (or fails if you didn't check it into git yet)
in case of the local jupyter notebook, I create the pipeline and when I start it, it all works without the necessity to add the jupyter notebook to git
my worker node is not a docker, but linux in conda environment
Hi John, I've done more experiments and found that this only happens if you try to run the pipeline remotely directly from the python interpreter
if I look at the code of the clearml controller.py, I see that it expects additional code at a relative folder
FYI: this is my pipeline script
from clearml import PipelineController
pipe = PipelineController(name="My Pipe", project="Gridsquare-Training", version="0.0.5")
pipe.add_step(name="pipe step 1", base_task_project="Gridsquare-Training", base_task_name="remo2")
pipe.add_step(name="pipe step 2", base_task_project="Gridsquare-Training", base_task_name="remo2", parents=["pipe step 1"])
pipe.set_default_execution_queue("myqueue")
pipe.start("service")