Still the same problem 😕 . I used pipe.set_default_execution_queue('agent')
and pipe.start_locally()
for my pipeline.py. When cloning, I enqueue the pipeline to another agent: Task.enqueue(task = clone_task.id, queue_name= 'agent(EC2)')
LittleShrimp86 can you post the full log of the pipeline? (something is odd here)
` from clearml import PipelineController
pipe = PipelineController(name="clearmlsample_pipeline",
project="clearmlsample",
version="1.0.0")
pipe.add_parameter('seed', 2222, description='random seed to standardize randomness')
pipe.add_parameter('n_trials', 10, description='trials to run during optimization')
pipe.add_step(
name='get_data', # can be named anything
# connect pipeline to task (obtain data from Task.init in python files)
base_task_project='clearmlsample', # project name
base_task_name='get data & preprocess' # task name
)
pipe.add_step(
name='train_and_tune', # name
base_task_project='clearmlsample', # project name
base_task_name='training and tuning', # task name
# connect it to previous task using name not base_task_name
# step cannot run unless parents finish
parents=['get_data'],
# use 'General/parameter' to override parameter from py file
# give value or take value from pipe.add_parameter using '${pipeline.parameter}'
#
# don need any special command to get data from Dataset.create() from previous task
# just use the Dataset.get to get latest version
# maybe easier to store everything except (raw data) in artificats instead
parameter_override={'General/seed': '${pipeline.seed}',
'General/n_trials': '${pipeline.n_trials}'}
)
pipe.add_step(
name='evaluate', # name
base_task_project='clearmlsample', # project name
base_task_name='evaluating', # task name
# connect it to previous task use name not base_task_name
parents=['train_and_tune'],
parameter_override={'General/seed': '${pipeline.seed}',
# get task id from previous step to get models
'General/train_task_id': '${train_and_tune.id}'}
)
select default worker to run steps
pipe.set_default_execution_queue('MonashPC')
start the pipeline logic
run this to run EVERYTHING locally
pipe.start_locally(run_pipeline_steps_locally=True)
run this to run logic locally but steps remotely
pipe.start_locally()
run this to run logic remotely
do not start pipeline with same worker as set_default_exceution_queue
e.g. pipe.start(queue='MonashPC')
it will cause the steps to be queued forever because it is occuped by the pipeline logic
pipe.start(queue= 'queue')
print('done') `
I mean test with:pipe.start_locally(run_pipeline_steps_locally=False)
This actually creates the steps as Tasks and launches them on remote machines
Let me test it out. I thought if I run it remotely with pipe.start_locally(run_pipeline_steps_locally=True)
, the local becomes remote instead
All right testing with:pipe.set_default_execution_queue('agent') pipe.start_locally() pipe.start(queue= 'agent')
Now I just need to wait for this to finish and clone it later 🤞
Seems to be working, its in the first stage 😮
No worries, you should probably change it to pipe.start(queue= 'queue')
not start locally
s it working when you are calling it with start locally ?
Hi, I found the problem using the example Martin gave. Apparently you cannot use pipe.start_locally()
at all when trying to clone the task and work completely remote (I thought it would treat the agent as local instead when I send it to a queue). It works with the combination of pipe.set_default_execution_queue('agent')
and pipe.start(queue = 'agent2(EC2)')
. However, must I really have two clearml-agents for complete automation? To the best of my knowledge, setting both the function above to the same queue will just cause an infinite queue. Is there no way to use only one worker for everything like start_locally(run_pipeline_steps_locally=True)
? For example, initially I thought if I use Task.enqueue(task = clone_task.id, queue_name= 'agent2(EC2)')
(cloning pipeline) and start_locally(run_pipeline_steps_locally=True)
(pipeline file), clearml will treat the agent2(EC2) as local instead.
Indeed, running pipelines that were started with pipe.start_locally
can not be cloned and ran. We will change this behaviour ASAP such that you can use just 1 queue for your use case.
Hi LittleShrimp86 ! Looks like something is broken. We are looking into it
Yes, similar but via Github Actions for automation. Just wanted to know if there is an easier way to connect to clearml instead of creating a new workflow for any CI/CD purpose.
Aside from that, I tried cloning my task (pipeline) and enqueuing it to a clearml-agent.filter = {'status': ['published'], 'order_by': ['-last_update'], 'type': ['controller']} pipeline_task = Task.get_tasks(project_name='clearmlsample/clearmlsample_pipeline/.pipeline', task_filter = filter) clone_task = Task.clone(source_task=pipeline_task[0], name = "Clone pipeline") Task.enqueue(task = clone_task.id, queue_name= 'agent')
I already ran the pipeline a few times locally which works fine. However when I try cloning it, the stages seem to go missing and the pipeline ends instantly.
Hi LittleShrimp86
just to login into your clearml app (demo or server) so I can run python files related to clearml.
I think this amounts to creating a Task and enqueueing it, am I understanding correctly ?
LittleShrimp86 what do you have in the Configuration Tab of the Cloned Pipeline?
(I think that it has empty configuration -> which means empty DAG, so it does nothing and leaves)
LittleShrimp86 did you try to run the pipeline form the UI on remote machines (i.e. with the agents)? Did that work?
I also tried cloning an individual task, That surprisingly works. Not sure why my pipeline doesn't.
Sorry for the comments haha. Trying to note down whatever I learn as much as I can
Yup, the pipeline stops instantly with "Launching the next 0 steps".
Yes the configuration and stages (DAG visualization) were there right until the agent finish cloning the environment. Then it goes missing.
I dont know why 😢 . Updated to 1.7.2, was staring at the configuration object, there was stuff in it until it reached "Starting Task Execution:" in the logs. It went missing after
The configuration tab -> configuration objects -> pipeline is empty
That's the reason it is doing nothing 😞
How come it is empty if you Cloned the local one?
Wait. are you saying it is disappearing ? meaning when you cloned the Pipeline (i.e. in draft mode) the configuration was there, then the configuration disappeared ?
This is very odd...
LittleShrimp86 is this example working for you?
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_tasks.py
BTW: Can you also please test with the latest clearml version , 1.7.2
The configuration tab -> configuration objects -> pipeline is empty