Reputation
Badges 1
533 × Eureka!I was trying out the pipeline controller for the first time and I felt a bit of a burden that just for the sake of trying I had to launch an agent
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
I was here, but I can't find info for the questions I mentioned
the link to manual model registry doesn't work
Manual model registration?
why not use my user and group?
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
AgitatedDove14 I really don't know how is this possible... I tried upgrading the server, tried whatever I could
About small toy code to reproduce I just don't have the time for that, but I will paste the callback I am using to this explanation. This is the overall logic so you can replicate and use my callback
From the pipeline task, launch some sub tasks, and put in their post_execute_callback
the .collect_description_tables
method from my callback class (attached below) Run t...
after you create the pipeline object itself , can you get Task.current_task() ?
AgitatedDove14 no I can't... Just checked this. This is a huge problem for us, it used to work before and it just stopped working and I can't figure out why.
It's a problem for us because we made it a methodology of running some tasks under a pipeline task and saving summary iunfo to the pipeline task - but now since Task.current_task()
doesn't work on the pipeline object we have a serious problem
The weirdest thing, is that the execution is "completed" but it actually failed
It's kind of random, it works sometimes and sometimes it doesn't
AgitatedDove14 just so you'd know this is a severe problem that occurs from time to time and we can't explain why it happens... Just to remind, we are using a pipeline controller task, which at the end of the last execution gathers artifacts from all the children tasks and uploads a new artifact to the pipeline's task object. Then what happens is that Task.current_task()
returns None
for the pipeline's task...
AgitatedDove14 sorry for the late reply,
It's right after executing all the steps. So we have the following block which determines whether we run locally or remotely
if not arguments.enqueue: pipe.start_locally(run_pipeline_steps_locally=True) else: pipe.start(queue=arguments.enqueue)
And right after we have a method that calls Task.current_task()
which returns None
I'll check the version tomorrow, about the current_task call, I tried before and after - same result
I'll check if this works tomorrow
Okay so at the first part of the code, we define some kind of callback that we add to our steps, so later we can collect them and attach the results to the pipeline task. It looks something like this
` class MedianPredictionCollector:
_tasks_to_collect = list()
@classmethod
def collect_description_tables(cls, pipeline: clearml.PipelineController, node: clearml.PipelineController.Node):
# Collect tasks
cls._tasks_to_collect.append(node.executed)
@classmethod...
I suspect that it has something to do with remote execution / local execution of pipelines, because we play with this , so sometimes the pipeline task itself executes on the client, and sometimes on the host (where the agent is also)
Okay so regarding the version - we are using 1.1.1
The thing with this error it that it happens sometimes, and when it happens it never goes away...
I don't know what causes it, but we have one host where it works okay, then someone else checks out the repo and tried and it fails for this error, while another guy can do the same and it will work for him
Yes, I'll prepare something and send
Maybe the case is that after start
/ start_locally
the reference to the pipeline task disappears somehow? O_O
that will require restarting the agent again?
I assume we are talking about the IP I would find here right?
https://www.whatismyip.com/