
Reputation
Badges 1
149 × Eureka!Strictly speaking, there is only one training task, but I want to keep top-3 best checkpoints for it all the time
I think it's still an issue, not critical though, because we have another way to do it and it works
So, to summarize:
PipelineController works with default image, but it incurs overhead 4-5 min It doesn't work with any other image
I can add issue on Github
But in the same time, it contains some keys that cannot be modified with task_overrides
, for example project_name
try add_step(..., task_overrides={'project_name': 'my-awesome-project', ...})
I initialize tasks not as functions, but as scripts from different repositories, with different images
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
with `cl...
@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent
if fails during add_step
stage for the very first step, because task_overrides
contains invalid keys
@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset()
before add_step
No, IMO it's better to leave task_overrides
arguments with "." - the same structure as in the dictionary we get from export_data
- this is more intuitive
I see the task on the web UI, but get Fetch experiment failed
when I click on it, as I described. It even fetches the correct ID by it's name. I'm almost sure it will be present in mongodb
this is so cursed, it's 10:30 pm
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
CostlyOstrich36 thank you for the answer! Maybe I just can delete old models along with corresponding tasks, seems to be easier
@<1523701435869433856:profile|SmugDolphin23> about ignore_parent_datasets
? I renamed it the same day you added that comment. Please let me know if there is anything else I need to pay attention to
You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.
With pipe.start(queue='services')
, it still tries to run some docker for some reason1633799714110 kirillfish-ROG-Strix-G512LW-G512LW info ClearML Task: created new task id=a4b0fbc6a1454947a06be4e48eda6740 ClearML results page:
`
1633799714974 kirillfish-ROG-Strix-G512LW-G512LW info ClearML new version available: upgrade to v1.1.2 is recommended!
...
add_files
. There is no upload
call, because add_files
uploads files by itself, if I got it correctly
Thanks. What is the difference between import_model
and load_model
?
Why can I only call import_model
with weights_url
and not model name or ID? This means I need to call query_models
first, if I got it right
AgitatedDove14 not exactly:
input: just a checkpoint file
output: a clearml model entity + stored weights on S3
sorry, no GH issue, just a link to this thread (I saw other contributors did this and got their PR merged, hehe)
CostlyOstrich36 idk, I need to share it to see
how do I share it?
but at that point it hadn't actually added any steps. Maybe failed pipelines with zero steps count as completed
For example, export_data
returns task configuration that contains many keys that can be modified with task_overrides