I Am Looking For The Dataset Used In Sarcasm Detection Demo

Hi @<1523701118159294464:profile|ExasperatedCrab78> ,
It worked after installing latest Huggingface Transformer from github main branch. Thank you so very much for your support.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BurlyHorse22
				
					0
					 × 1

I still have my tasks I ran remotely and they don't show any uncommitted changes. @<1540142651142049792:profile|BurlyHorse22> are you sure the remote machine is running transformers from the latest github branch, instead of from the package?

If it all looks fine, can you please install transformers from this repo (branch main) and rerun? It might be that not all my fixes came through

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

@<1540142651142049792:profile|BurlyHorse22> do you mean the one refereed in the video ? (I think this is the raw data in kaggle)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701118159294464:profile|ExasperatedCrab78> ,

I flagged some examples and created a new dataset. And the cloned the DistilBert Training task and then Enqueued for running in an agent. But it failed with the below error


{'eval_loss': 0.6758520603179932, 'eval_accuracy': 0.5912839158071777, 'eval_runtime': 232.0297, 'eval_samples_per_second': 871.246, 'eval_steps_per_second': 54.454, 'epoch': 1.0}
 50%|█████     | 63/126 [03:58<00:04, 14.88it/s]     Traceback (most recent call last):
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/task_repository/sarcasm_detector.git/train_transformer.py", line 141, in <module>
    sarcasm_trainer.train()
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/task_repository/sarcasm_detector.git/train_transformer.py", line 134, in train
    self.trainer.train()
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1631, in train
    return inner_training_loop(
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1990, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2236, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2293, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2771, in save_model
    self._save(output_dir)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2823, in _save
    self.model.save_pretrained(output_dir, state_dict=state_dict)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1708, in save_pretrained
    model_to_save.config.save_pretrained(save_directory)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 456, in save_pretrained
    self.to_json_file(output_config_file, use_diff=True)
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 838, in to_json_file
    writer.write(self.to_json_string(use_diff=use_diff))
  File "/home/ubuntu/.clearml/venvs-builds.2/3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 824, in to_json_string
    return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 353, in _iterencode_dict
    items = sorted(dct.items())
TypeError: '<' not supported between instances of 'str' and 'int'

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BurlyHorse22
				
					0
					 × 1

Hi @<1523701070390366208:profile|CostlyOstrich36> ,

The same code runs fine when I am running it directly from VS Code. But when tried to clone & enqueue the task on an agent this error occurs. My agent is running inside an EC2 instance with GPU. I am using the same python virtual environment when running from VSCode & also while running the agent.
I am not aware of any way to debug my code when I clone & enqueue a task that runs on an agent.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BurlyHorse22
				
					0
					 × 1

I've updated the repo 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Yes the one reffered in video. But @<1523701118159294464:profile|ExasperatedCrab78> had mentioned (at 3.45 minute of YouTube video) that he was using it after some preprocessing. The raw data from Kaggle is not not gettting loaded using huggingface load_dataset() function. Please find the screenshot of the error while running train_sklearn.py and train_transformer.py. So, I am assuming it will work if I get the preprocessed data.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BurlyHorse22
				
					0
					 × 1

Hi @<1540142651142049792:profile|BurlyHorse22> , it looks like an error in your code that is bringing the traceback. What is happening during the traceback?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701118159294464:profile|ExasperatedCrab78> The dataset loading issue is not coming up as I have started using the data shared in the github repo- Thanks a lot for the quick response.

But now I am facing a different issue, Now there is a conflict in creating Clearml Task,

Current task already created and requested project name ' HuggingFace Transformers ' does not match current project name 'sarcasm_detector'. If you wish to create additional tasks useTask.create, or close the current task withtask.close()before callingTask.init(...)``

Note: I do not see the Project Name "HuggingFace Transformers" mentioned anywhere in the code too.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					BurlyHorse22
				
					0
					 × 1

Ah I see 😄 I have submitted a ClearML patch to Huggingface transformers: None

It is merged, but not in a release yet. Would you mind checking if it works if you install transformers from github? (aka the latest master version)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Great to hear! Then it comes down to waiting for the next hugging release!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hi @<1540142651142049792:profile|BurlyHorse22> I think I know what is happening. So, ClearML does not support having dict keys by any other type than string. This is why I made these functions to cast the dict keys to string and back after we connect them to clearml.

What happens I think is that id2label is a dict with ints as keys and it is not cast into string before being given to the model which in turn will be connected by the internal Huggingface integration to ClearML.

I'm checking now what I did about it in my branch, it seems maybe not everything was pushed yet!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Answers 12