Reputation
Badges 1
108 × Eureka!SmugSnake6 yep, thatās exactly it.
Hope the team is aware and will fix it
Artifacts, nothing is reaching s3
also, i donāt need to change it during execution, i want it for a specific run
yeah, it gets to that error because the previous issue is savedā¦iāll try to work on a new example
I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week
Yes it worked š
I loaded my entire clearml.conf in the āextra confā part of the auto scaler, that worked
that does happen when you create a normal local task, thatās why i was confused
but it makes sense, because the agent in that case is local
How does this work in the context of a pipeline? One of the steps is a multi gpu training that requires accelerate.
donāt have one ATM
The pipeline is a bit complex, but it did that with a very dumb example
Glad to hear you were able to reproduce it! Waiting for your reply š
Yes, but itās more complex because iām using a pipeline⦠where i donāt explicitly call Task.init()
what i'm doing is getting
parent = Task.get_task(task.parent)
and then checkingparent.data.user
but the user is some unknown id that doesn't exist in the all_users list
but anyway, this will still not work because fastaiās tensorboard doesnāt work in multi gpu š
@<1523701435869433856:profile|SmugDolphin23> @<1523701205467926528:profile|AgitatedDove14>
Any updates? š
you can get updates on the issue i opened
https://github.com/fastai/fastai/issues/3543
but i think the probably better solution would be to create a custom ClearML callback for fastai with the best practices you think are neededā¦
Or try to fix the TensorBoardCallback, because for now we canāt use multi gpu because of it šŖ
AgitatedDove14 So it looks like it started to do something, but now itās missing parts of the configuration
Missing key and secret for S3 storage access(iām using boto credential chain, which is off by defaultā¦)
why isnāt the config being passed to the inner step properly ?
We tried both subprocess.run and popen
tried your suggestion, still got to file serverā¦
to make it very reproducible, i created a docker file for it, so make sure to run build_docker.sh and then run.sh
looks like itās working š tnx
iāll try to work on something that works on 1.7.2
when i did this with a normal task it worked wonderfully, with pipeline it didnāt