AgitatedDove14 Yes, the difference in installed packages is large - the training stage, which runs ok has all the following:
Hi everyone. I have an issue with the simple pipeline - it runs two similar nn training steps (tf2.3, windows10, python 3.7) with only difference is a batch size. I'm running first separately each step to have them in ClearML project page. Then I run pipeline controller, which makes a clone of each step and runs smoothly. If I run pipeline from command string again, it works Ok. However, if I clone and enqueue the pipeline, it starts, creates the clone of the fist step pending and then nothing happens. First step remains pending and doesn't start. Can anyone help with the issue? Here's the pipeline controller code:
` from clearml import Task
from clearml.automation.controller import PipelineController
Connecting ClearML with the current process,
from here on everything is logged automatically
task = Task.init(project_name='Tom', task_name='test pipeline',
task_type=Task.TaskTypes.controller, reuse_last_task_id=False)
pipe = PipelineController(default_execution_queue='default', add_pipeline_tags=False)
pipe.add_step(name='train_1st_nn_copy', base_task_project='Tom', base_task_name='train_1st_nn', parameter_override={'batch_size': 8})
pipe.add_step(name='train_2nd_nn_copy', parents=['train_1st_nn_copy', ],
base_task_project='Tom', base_task_name='train_2nd_nn',
parameter_override={'batch_size': 4})
Starting the pipeline (in the background)
pipe.start()
Wait until pipeline terminates
pipe.wait()
cleanup everything
pipe.stop()
print('done') `If I abort pipeline controller task, pending "train_1st_nn" task executes ok.