Reputation
Badges 1
108 × Eureka!tnx, i just can’t use 1.7.1 because of the pipeline problem from before
@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline
but anyway, this will still not work because fastai’s tensorboard doesn’t work in multi gpu 😞
@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.
I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related
i get for one of the tasks, but then it fails because it seems that the fastai2 tensorboardcallback isn’t fit for distributed training (which i’m opening an issue for them now)
confirming that only downgrading to transformers==4.21.3 without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code
when u say use Task.current_task() you for logging? which i’m guessing that the fastai binding should do right?
` args.py #504:
for k, v in dictionary.items():
# if key is not present in the task's parameters, assume we didn't get this far when running
# in non-remote mode, and just add it to the task's parameters
if k not in parameters:
self._task.set_parameter((prefix or '') + k, v)
continue
task.py #1266:
def set_parameter(self, name, value, description=None, value_type=None):
# type: (str, str, Optional[str], O...
Glad to hear you were able to reproduce it! Waiting for your reply 🙏
and the agent is outputting sdk.development.default_output_uri =
although it’s different in both the original config, and the agent extra config
CostlyOstrich36 This is for a step in the pipeline
Noting one difference i do is using TensorBoardCallback , because i believe the clearml docs use an outdated fastai 1 version…
I have but i believe i found the issue
that does happen when you create a normal local task, that’s why i was confused
i had a misconception that the conf comes from the machine triggering the pipeline
thisfrom fastai.callbacks.tensorboard import LearnerTensorboardWriter
doesn’t exist anymore in fastai2
If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)
@<1523701435869433856:profile|SmugDolphin23> @<1523701205467926528:profile|AgitatedDove14>
Any updates? 🙂
@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error
from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np
import ...
to make it very reproducible, i created a docker file for it, so make sure to run build_docker.sh and then run.sh
Traceback (most recent call last):
File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
File "/usr/src/lib/clearml_test.py", line 69, in <module>
pipeline()
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in intern...
Yes tnx for clarifying 😁
that’s what i started with, doesn’t work in pipelines