Unanswered

Hello, There'S A Particular Metric (Perplexity) I'D Like To Track, But Clearml Didn'T Seem To Catch It. Specifically, This "Evaluation" Section Of Run_Mlm.Py In The Transformers Repo:

Reproduce the training:
# How to run `

You need to pip install requirements first. I think the following would do: transformers datasets clearml tokenizers torch

CLEAR_DATA has train.txt and validation.txt, the .txt files just need to have text data on separate lines. For debugging, anything should do.

For training you need tokenizer files as well, vocab.json, merges.txt, and tokenizer.json.

you also need a config.json, should work.

export CLEAR_DATA="./data/dataset_for_modeling"
python3 run_mlm.py
--line_by_line
--seed 420
--config_name "$CLEAR_DATA"
--tokenizer_name "$CLEAR_DATA"
--train_file "$CLEAR_DATA/train.txt"
--validation_file "$CLEAR_DATA/validation.txt"
--max_seq_length 512
--do_train
--do_eval
--evaluation_strategy steps
--eval_steps 500
--save_strategy epoch
--num_train_epochs 3
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--output_dir ./output/mlm_training_output `Let me get you a dataset

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SmallDeer34
				
					0
					 × 1

282 Views

0 Answers

4 years ago

2 years ago