ClearML seems to store stuff that's relevant to script execution outside of clearml.Task
Outside of the cleaml.Task?
Long story short, the Task requirements are async, so if one puts it after creating the object (at least in theory), it might be too late.
AgitatedDove14 Is there no await/synchronize method to wait for task update?
I think doing all that work is not worth it right now, I am just trying to understand why I clearml seems not to be designed something like this:
` task_name = args.task_name
task = Task()
task = task.load_statedict(await Task.load_or_create(task_name))
task.requirements.add(...)
await task.synchronize()
task.execute_remotely(queue_name, exit=True) `
Long story short, the Task requirements are async, so if one puts it after creating the object (at least in theory), it might be too late.
Make sense ?
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusionย
LOL no worries ๐
Basically the git & python analysis can take some time (I mean it can take a minute! on a large repository)
And we wanted to make sure Task.init returns quickly (it already has to authenticate with the server that slows it down, and a few more things)
The easiest way is to have the code analysis run in the background since usually there is no interaction with this process (on the user end).
Back to your pseudo-code suggestion, is this how a User will use it, or how you suggest clearml implements the feature ? (I'm genuinely interested, we are always looking for ideas on improving the user interface)
Maybe related question: Will there be some documentation about clearml internals with the new documentation? ClearML seems to store stuff that's relevant to script execution outside of clearml.Task if I am not mistaken. I would like to learn a little bit about what the code structure / internal mechanism is.
Outside of the cleaml.Task?
Ah, nevermind. I thought wrong here.
I am still not getting why it is a problem to just update the requirements at any time... ๐
Too late for what?
To update the task.requirements before it actually creates it (the requirements are created in a background thread)
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion ๐
Is there no await/synchronize method to wait for task update?
Yes, but then we will have to relaunch it (not unthinkable), but I'm still looking for the intimidate value of doing all that work, wdyt?
Both, actually. So what I personally would find intuitive is something like this:
` class Task:
def load_statedict(self, state_dict):
pass
async def synchronize(self):
...
async def task_execute_remotely(self):
await self.synchronize()
...
def add_requirement(self, requirement):
...
@classmethod
async def init(task_name):
task = Task()
task.load_statedict(await Task.load_or_create(task_name))
await task.synchronize()
asyncio.create_task(run_code_analysis().then_synchronize())
`
I can either use the existing "easy" way or can build my init(). This moves customization from a lot of arguments for Task.init()
to something like a custom "builder"-method. I.e. I can use clearml either as a library and build my own workflow or just use the existing predefined workflow.
Why can't it be updated after creation?
You can but then you have to rerun it again. I mean technically this is obviously solvable, but the idea was to make it simple to use, and since we "assume" in most cases there is a single Task per execution, it made sense. wdyt?
I think I still don't get how clearml is supposed to work/be used. Why wouldn't the following work currently?
Example:task = Task.init(...) if not running_remotely: task_dict = task.export_task() requirements = task_dict["script"]["requirements"]["pip"].splitlines() requirement_torch = [r for r in requirements if r.startswith("torch==")] requirements.remove(requirement_torch[0]) requirements.append("torch >= 1.8.1") task_dict["script"]["requirements"]["pip"] = "\n".join(requirements) task.update_task(task_dict) task.execute_remotely(...)
Mhhm, then maybe it is not clear ๐ to me how clearml.Task is meant to be used. I thought of it as being a container for all the information regarding a single experiment that is reflected on the server-side and by this in the WebUI. Now I init() a Task and it will show in the WebUI. I thought after initialization I can still update the task to my liking, i.e. it being a documentation of my experiment.
To update the task.requirements before it actually creates it (the requirements are created in a background thread)
Why can't it be updated after creation?
Then I could also do this:# My custom very special use case task = Task() task = task.load_statedict(await Task.load_or_create(task_name)) await task.synchronize() await run_code_analysis() task.add_requirement("myreq") await task.synchronize()