Reputation
Badges 1
106 × Eureka!i believe this is because of transformer’s integration:
Automatic ClearML logging enabled.
ClearML Task has been initialized.
when a task already exists
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1
isn’t able to upload artifacts.
If we downgrade transformers==4.21.3
it works
hi, yes we tried with the same result
that makes more sense 🙂
would this work now as a workaround until the version is released?
i believe this is because of this code
None
Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it
@<1523701118159294464:profile|ExasperatedCrab78>
Hey again 🙂
I believe that the transformers patch wasn’t released yet right? we are getting into a problem where we need new features from transformers but can’t use because of this
@<1523701118159294464:profile|ExasperatedCrab78> Sorry only saw this now,
Thanks for checking it!
Glad to see you found the issue, hope you find a way to fix the second one. for now we will continue using the previous version.
Would be glad if you can post when everything is fixed so we can advance our version.
yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
sounds good 🙂 I’ll soon check if this fixes our issue and update you
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
Artifacts, nothing is reaching s3
Saw it was merged 🙂 One down, one to go
@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.
Traceback (most recent call last):
File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
File "/usr/src/lib/clearml_test.py", line 69, in <module>
pipeline()
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in intern...
BTW the code above is from clearml github so it’s the latest
Looks like the first issue has been solved 🙂
i think the second one still consists, still checking
for now we downgraded to 1.7.2, but of course prefer not to stay that way
SmugSnake6 yep, that’s exactly it.
Hope the team is aware and will fix it
We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image
That way you don’t rely on clearml capturing the local env, and you can control what exists in the env
not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend
AgitatedDove14 . so if i understand correctly, what i can possibly do is copy paste the https://github.com/fastai/fastai/blob/master/fastai/launch.py code and add the Task.init there?
when u say use Task.current_task()
you for logging? which i’m guessing that the fastai binding should do right?
but anyway, this will still not work because fastai’s tensorboard doesn’t work in multi gpu 😞
thisfrom fastai.callbacks.tensorboard import LearnerTensorboardWriter
doesn’t exist anymore in fastai2
reduced to a small snippet
` from fastai.vision.all import *
from fastai.distributed import *
from clearml import Task
from fastai.callback.tensorboard import TensorBoardCallback
from wwf.vision.timm import timm_learner
task = Task.init(project_name='LIOR_TEST', auto_connect_arg_parser={'rank': False})
path = untar_data(URLs.PETS)
size = 460
batch_size = 32
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
get_y=lambda x: ...