Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hello, There'S A Particular Metric (Perplexity) I'D Like To Track, But Clearml Didn'T Seem To Catch It. Specifically, This "Evaluation" Section Of Run_Mlm.Py In The Transformers Repo:


Reproduce the training:
# How to run `

You need to pip install requirements first. I think the following would do: transformers datasets clearml tokenizers torch

CLEAR_DATA has train.txt and validation.txt, the .txt files just need to have text data on separate lines. For debugging, anything should do.

For training you need tokenizer files as well, vocab.json, merges.txt, and tokenizer.json.

you also need a config.json, should work.

export CLEAR_DATA="./data/dataset_for_modeling"
python3 run_mlm.py
--line_by_line
--seed 420
--config_name "$CLEAR_DATA"
--tokenizer_name "$CLEAR_DATA"
--train_file "$CLEAR_DATA/train.txt"
--validation_file "$CLEAR_DATA/validation.txt"
--max_seq_length 512
--do_train
--do_eval
--evaluation_strategy steps
--eval_steps 500
--save_strategy epoch
--num_train_epochs 3
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--output_dir ./output/mlm_training_output `Let me get you a dataset

  
  
Posted 3 years ago
153 Views
0 Answers
3 years ago
one year ago