Reputation
Badges 1
106 × Eureka!I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week
Hi @<1523701435869433856:profile|SmugDolphin23>
Confirming that rank0 process does not hang with the new version!
The accelerate CLI problem does still reproduce though (it's in my demo)
@<1523701205467926528:profile|AgitatedDove14>
Only got some time to work on it now, i created a small reproducible example.
I also tried to use your suggestion with import accelerate, it also had issues.
overall, when using debug_pipeline
it works ok, but both methods don't work without it, i think it has something to do with wrapping accelerate.
Problem with launching through python module (your suggestion), the argparse breaks.
Problem with launching using a new process - rank0 proce...
yeah, it gets to that error because the previous issue is savedβ¦iβll try to work on a new example
Hi, yes it's running with autoscaler so it's for sure in docker mode
Are you saying that it should've worked? I got 'docker' attribute doesn't exist error. Maybe it's the version of the clearml server?
that does happen when you create a normal local task, thatβs why i was confused
that makes more sense π
would this work now as a workaround until the version is released?
looks like itβs working π tnx
We tried both subprocess.run and popen
BTW, i would expect this to happen automtically when running βlocalβ and βdebugβ
Artifacts, nothing is reaching s3
CostlyOstrich36 This is for a step in the pipeline
thisfrom fastai.callbacks.tensorboard import LearnerTensorboardWriter
doesnβt exist anymore in fastai2
@<1523701435869433856:profile|SmugDolphin23> @<1523701205467926528:profile|AgitatedDove14>
Any updates? π
to make it very reproducible, i created a docker file for it, so make sure to run build_docker.sh
and then run.sh