Reputation
Badges 1
108 × Eureka!Looks like the first issue has been solved 🙂
i think the second one still consists, still checking
sounds good 🙂 I’ll soon check if this fixes our issue and update you
We tried both subprocess.run and popen
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
Yes it worked 🙂
I loaded my entire clearml.conf in the “extra conf” part of the auto scaler, that worked
regarding what AgitatedDove14 suggested, i’ll try tomorrow and update
that makes more sense 🙂
would this work now as a workaround until the version is released?
i believe this is because of this code
None
Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it
Artifacts, nothing is reaching s3
@<1523701070390366208:profile|CostlyOstrich36>
Sorry for the (very) late response.
We use the open source version which isn't part of the ClearML setup.
Anyway, we are using a standalone script but we have it source controlled in git... clearml picks this up and tries to clone the entire repo in the agent. i want to prevent this an just use the script.
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.
Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.
tried your suggestion, still got to file server…
I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week
@<1523701205467926528:profile|AgitatedDove14>
Only got some time to work on it now, i created a small reproducible example.
I also tried to use your suggestion with import accelerate, it also had issues.
overall, when using debug_pipeline it works ok, but both methods don't work without it, i think it has something to do with wrapping accelerate.
Problem with launching through python module (your suggestion), the argparse breaks.
Problem with launching using a new process - rank0 proce...
How does this work in the context of a pipeline? One of the steps is a multi gpu training that requires accelerate.
you can get updates on the issue i opened
https://github.com/fastai/fastai/issues/3543
but i think the probably better solution would be to create a custom ClearML callback for fastai with the best practices you think are needed…
Or try to fix the TensorBoardCallback, because for now we can’t use multi gpu because of it 😪
i didn’t, prefer not to add temporary workarounds
Hi, yes it's running with autoscaler so it's for sure in docker mode
Are you saying that it should've worked? I got 'docker' attribute doesn't exist error. Maybe it's the version of the clearml server?
SmugDolphin23 SuccessfulKoala55 ^
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works
when i did this with a normal task it worked wonderfully, with pipeline it didn’t
This is the next step not being able to find the output of the last step
ValueError: Could not retrieve a local copy of artifact return_object, failed downloading
@<1523701118159294464:profile|ExasperatedCrab78> Sorry only saw this now,
Thanks for checking it!
Glad to see you found the issue, hope you find a way to fix the second one. for now we will continue using the previous version.
Would be glad if you can post when everything is fixed so we can advance our version.