Reputation
Badges 1
533 × Eureka!AgitatedDove14 just a reminder if you missed this question 😄
SuccessfulKoala55 here it is
I prefer we debug on my machine (tell me what you want to check) than create a snippet
after you create the pipeline object itself , can you get Task.current_task() ?
AgitatedDove14 no I can't... Just checked this. This is a huge problem for us, it used to work before and it just stopped working and I can't figure out why.
It's a problem for us because we made it a methodology of running some tasks under a pipeline task and saving summary iunfo to the pipeline task - but now since Task.current_task()
doesn't work on the pipeline object we have a serious problem
I'll check the version tomorrow, about the current_task call, I tried before and after - same result
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
AgitatedDove14 sorry for the late reply,
It's right after executing all the steps. So we have the following block which determines whether we run locally or remotely
if not arguments.enqueue: pipe.start_locally(run_pipeline_steps_locally=True) else: pipe.start(queue=arguments.enqueue)
And right after we have a method that calls Task.current_task()
which returns None
It's kind of random, it works sometimes and sometimes it doesn't
In the larger context I'd look on how other object stores treat similar problems, I'm not that advanced in these topics.
But adding a simple force_download
flag to the get_local_copy
method could solve many cases I can think of, for example I'd set it to true in my case as I don't mind the times it will re-download when not necessary as it is quite small (currently I always delete the local file, but it looks pretty ugly)
BTW is the if not cached_file: return cached_file
is legit or a bug?
I don't htink I can, this is private IP and to create a dummy example of a pipeline and execution will take me more time than I can dedicate to this
Oh I get it, that also makes sense with the docs directing this at inference jobs and avoiding GPU - because of the 1-N thing
If you want we can do live zoom or something so you can see what happens
I suspect that it has something to do with remote execution / local execution of pipelines, because we play with this , so sometimes the pipeline task itself executes on the client, and sometimes on the host (where the agent is also)
I was here, but I can't find info for the questions I mentioned
what if i want it to use ssh creds?
Maybe the case is that after start
/ start_locally
the reference to the pipeline task disappears somehow? O_O
Manual model registration?
I only have like 40 tasks including the example ones
I'll check if this works tomorrow
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
Or should I change all three of them?
cluster.routing.allocation.disk.watermark.low: