Sounds good. Lmk if there's some changes that are required.
@<1523701435869433856:profile|SmugDolphin23> Yeah, I just wanted to validate it was worth spending the time. Since there is already a parameter that takes callable (i.e. schedule_function
) it might make sense that we reuse the parameter. If it returns a str we validate that it's a task and if it does we can run the task as if we originally passed it as the task_id
in .add_task()
. This would only be a breaking change if the callable that was passed happened to return a task_id
. Or do you think it would be better just to add a new parameter?
Thanks Eugen for the quick reply. If I can add a suggestion/comment from my perspective: Why is schedule_function
included in the .add_task()
method? As far as I can tell if you use schedule_function
it changes the very nature of the method, it's no longer adding a task but adding a function . It seems like it would make more sense if this was broken into something like an .add_function()
method. Also, if you call schedule_function
many of the other parameters in .add_task()
don't make sense. What is task_overrides
overriding if you use schedule_function
? I think this would also make sense given the other places in ClearML where there are distinctions made between running a task vs running function .
Ok, grandpa's rant is over. 👴
With that said, can I run another thing by you related to this. What do you think about a PR that adds the functionality I originally assumed schedule_function
was for? By this I mean: adding a new parameter (this wouldn't change anything about schedule_function
or how .add_task()
currently behaves) that also takes a function but the function expects to get a task_id
when called. This function is run at runtime (when the task scheduler would normally execute the scheduled task) and use the task_id
returned by the function + the other parameters from .add_task()
as the scheduled task.
Why is this useful: there's a host of reasons but the biggest one: it gives users much more control over the tasks that are run by the task scheduler. Currently, as far as I can tell, if I wanted to run the most recent task (at runtime) from a given project with a specific tag, it's not possible to do with the task scheduler. I can use the schedule_function
parameter and create a function that finds and runs the task but then I lose one of the core advantages of .add_task()
, no way to specify queues, task_parameters
and task_overrides
. Naturally, I could wrap all of that into the function called by task_parameters
but then I'm basically just writing my own scheduler at that point.
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! That's correct. The job function will run in a separate thread on the machine you are running the scheduler from. That's it. You can create tasks from functions tho using backend_interface.task.populate.CreateFromFunction.create_task_from_function
I figured you'd say that so I went ahead with that PR. I got it working but I'm going to test it a bit further.
With that said, can I run another thing by you related to this. What do you think about a PR that adds the functionality I originally assumed schedule_function was for? By this I mean: adding a new parameter (this wouldn't change anything about schedule_function or how .add_task() currently behaves) that also takes a function but the function expects to get a task_id when called. This function is run at runtime (when the task scheduler would normally execute the scheduled task) and use the task_id returned by the function + the other parameters from .add_task() as the scheduled task.
That is a great idea actually. Are you going to write a PR for this?