i’ll try to work on something that works on 1.7.2
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1
isn’t able to upload artifacts.
If we downgrade transformers==4.21.3
it works
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
Hey 🙂 Thanks for the update!
what i’m missing the is the point where you report to clearml between cast and casting back 🤔
Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual
@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline
yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!
i believe this is because of transformer’s integration:
Automatic ClearML logging enabled.
ClearML Task has been initialized.
when a task already exists
@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)
However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
It should, but please check first. This is some code I quickly made for myself. It did make tests for it, but it would be nice to hear from someone else that it worked (as evidenced by the error above 😅 )
sounds good 🙂 I’ll soon check if this fixes our issue and update you
Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?
Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?
This is the next step not being able to find the output of the last step
ValueError: Could not retrieve a local copy of artifact return_object, failed downloading
Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)
Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.
It's been accepted in master, but was not released yet indeed!
As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.
For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict somewhere) just make sure to cast it to string before connecting it to ClearML and casting it back to int directly after. So that when ClearML changes the value, it's properly taken care of 🙂 My previous sample code is still valid!
Allright, a bit of searching later and I've found 2 things:
- You were right about the task! I've staged a fix here . It basically detects whether a task is already running (e.g. from the pipelinedecorator component) and if so, uses that task instead. We should probably do this for all of our integrations.
- But then I found another bug. Basically the pipeline decorator task would mess up the internal nested dict of the label mapping inside of the model config. You will probably have the same issue if you run the pipeline with my fix above.
So for now, we're looking into the 2nd bug, because it breaks with Hugging Face models in a pipeline. Until we sort that out, I'm going to hold off on opening a PR to HF with the first fix. Makes sense?
Thanks a lot for the example, it helped tons to be able to reproduce!