Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Am Looking For The Dataset Used In Sarcasm Detection Demo

I am looking for the dataset used in SARCASM DETECTION demo None

Can someone help find the preprocessed datat not the raw data from Kaggle.

  
  
Posted one year ago
Votes Newest

Answers 12


I still have my tasks I ran remotely and they don't show any uncommitted changes. @<1540142651142049792:profile|BurlyHorse22> are you sure the remote machine is running transformers from the latest github branch, instead of from the package?

If it all looks fine, can you please install transformers from this repo (branch main) and rerun? It might be that not all my fixes came through

  
  
Posted one year ago

Yes the one reffered in video. But @<1523701118159294464:profile|ExasperatedCrab78> had mentioned (at 3.45 minute of YouTube video) that he was using it after some preprocessing. The raw data from Kaggle is not not gettting loaded using huggingface load_dataset() function. Please find the screenshot of the error while running train_sklearn.py and train_transformer.py. So, I am assuming it will work if I get the preprocessed data.
image
image

  
  
Posted one year ago

Hi @<1523701070390366208:profile|CostlyOstrich36> ,

The same code runs fine when I am running it directly from VS Code. But when tried to clone & enqueue the task on an agent this error occurs. My agent is running inside an EC2 instance with GPU. I am using the same python virtual environment when running from VSCode & also while running the agent.
I am not aware of any way to debug my code when I clone & enqueue a task that runs on an agent.

  
  
Posted one year ago

Hi @<1523701118159294464:profile|ExasperatedCrab78> ,

I flagged some examples and created a new dataset. And the cloned the DistilBert Training task and then Enqueued for running in an agent. But it failed with the below error


{'eval_loss': 0.6758520603179932, 'eval_accuracy': 0.5912839158071777, 'eval_runtime': 232.0297, 'eval_samples_per_second': 871.246, 'eval_steps_per_second': 54.454, 'epoch': 1.0}
 50%|█████     | 63/126 [03:58<00:04, 14.88it/s]     Traceback (most recent call last):
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/task_repository/sarcasm_detector.git/train_transformer.py", line 141, in <module>
    sarcasm_trainer.train()
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/task_repository/sarcasm_detector.git/train_transformer.py", line 134, in train
    self.trainer.train()
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1631, in train
    return inner_training_loop(
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1990, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2236, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2293, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2771, in save_model
    self._save(output_dir)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2823, in _save
    self.model.save_pretrained(output_dir, state_dict=state_dict)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1708, in save_pretrained
    model_to_save.config.save_pretrained(save_directory)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 456, in save_pretrained
    self.to_json_file(output_config_file, use_diff=True)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 838, in to_json_file
    writer.write(self.to_json_string(use_diff=use_diff))
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 824, in to_json_string
    return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 353, in _iterencode_dict
    items = sorted(dct.items())
TypeError: '<' not supported between instances of 'str' and 'int'
  
  
Posted one year ago

I've updated the repo 🙂

  
  
Posted one year ago

Great to hear! Then it comes down to waiting for the next hugging release!

  
  
Posted one year ago

Hi @<1540142651142049792:profile|BurlyHorse22> , it looks like an error in your code that is bringing the traceback. What is happening during the traceback?

  
  
Posted one year ago

Ah I see 😄 I have submitted a ClearML patch to Huggingface transformers: None

It is merged, but not in a release yet. Would you mind checking if it works if you install transformers from github? (aka the latest master version)

  
  
Posted one year ago

@<1523701118159294464:profile|ExasperatedCrab78> The dataset loading issue is not coming up as I have started using the data shared in the github repo- Thanks a lot for the quick response.

But now I am facing a different issue, Now there is a conflict in creating Clearml Task,

Current task already created and requested project name ' HuggingFace Transformers ' does not match current project name 'sarcasm_detector'. If you wish to create additional tasks useTask.create, or close the current task withtask.close()before callingTask.init(...)``

Note: I do not see the Project Name "HuggingFace Transformers" mentioned anywhere in the code too.
image

  
  
Posted one year ago

Hi @<1540142651142049792:profile|BurlyHorse22> I think I know what is happening. So, ClearML does not support having dict keys by any other type than string. This is why I made these functions to cast the dict keys to string and back after we connect them to clearml.

What happens I think is that id2label is a dict with ints as keys and it is not cast into string before being given to the model which in turn will be connected by the internal Huggingface integration to ClearML.

I'm checking now what I did about it in my branch, it seems maybe not everything was pushed yet!

  
  
Posted one year ago

@<1540142651142049792:profile|BurlyHorse22> do you mean the one refereed in the video ? (I think this is the raw data in kaggle)

  
  
Posted one year ago

Hi @<1523701118159294464:profile|ExasperatedCrab78> ,
It worked after installing latest Huggingface Transformer from github main branch. Thank you so very much for your support.

  
  
Posted one year ago
989 Views
12 Answers
one year ago
one year ago
Tags