
Reputation
Badges 1
107 × Eureka!using api.files_server? not default_output ?
tried your suggestion, still got to file serverβ¦
but it makes sense, because the agent in that case is local
Yes tnx for clarifying π
I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
BTW the code above is from clearml github so itβs the latest
I'm working with the patch, and installing transformers from github
sounds good π Iβll soon check if this fixes our issue and update you
that makes more sense π
would this work now as a workaround until the version is released?
If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)
It's with decorators.
Interesting, i wasn't aware of this python module for executing accelerate. I'll try to use that.
We used subprocess for it, but for some reason only when invoked in the pipeline the process freezes and doesn't close the main accelerate process. Works fine outside of clearml, any Idea?
We tried both subprocess.run and popen
How does this work in the context of a pipeline? One of the steps is a multi gpu training that requires accelerate.
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesnβt happen on 1.7.2)
Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.
@<1523701070390366208:profile|CostlyOstrich36>
Sorry for the (very) late response.
We use the open source version which isn't part of the ClearML setup.
Anyway, we are using a standalone script but we have it source controlled in git... clearml picks this up and tries to clone the entire repo in the agent. i want to prevent this an just use the script.
@<1523701205467926528:profile|AgitatedDove14>
Only got some time to work on it now, i created a small reproducible example.
I also tried to use your suggestion with import accelerate, it also had issues.
overall, when using debug_pipeline
it works ok, but both methods don't work without it, i think it has something to do with wrapping accelerate.
Problem with launching through python module (your suggestion), the argparse breaks.
Problem with launching using a new process - rank0 proce...
to make it very reproducible, i created a docker file for it, so make sure to run build_docker.sh
and then run.sh
Glad to hear you were able to reproduce it! Waiting for your reply π
@<1523701435869433856:profile|SmugDolphin23> @<1523701205467926528:profile|AgitatedDove14>
Any updates? π