Unanswered
you also need a config.json,
Hello, There'S A Particular Metric (Perplexity) I'D Like To Track, But Clearml Didn'T Seem To Catch It.
Specifically, This "Evaluation" Section Of Run_Mlm.Py In The Transformers Repo:
Reproduce the training:# How to run
`
You need to pip install requirements first. I think the following would do: transformers datasets clearml tokenizers torch
CLEAR_DATA has train.txt and validation.txt, the .txt files just need to have text data on separate lines. For debugging, anything should do.
For training you need tokenizer files as well, vocab.json, merges.txt, and tokenizer.json.
you also need a config.json,
should work.
export CLEAR_DATA="./data/dataset_for_modeling"
python3 run_mlm.py
--line_by_line
--seed 420
--config_name "$CLEAR_DATA"
--tokenizer_name "$CLEAR_DATA"
--train_file "$CLEAR_DATA/train.txt"
--validation_file "$CLEAR_DATA/validation.txt"
--max_seq_length 512
--do_train
--do_eval
--evaluation_strategy steps
--eval_steps 500
--save_strategy epoch
--num_train_epochs 3
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--output_dir ./output/mlm_training_output `Let me get you a dataset
153 Views
0
Answers
3 years ago
one year ago