Reputation
Badges 1
132 × Eureka!oooh, that's awesome lol. Never thought to do it that way
Oh, here's an example, a screenshot I took of the files in my Colab instance:
Which is defined, it seems, here: https://github.com/huggingface/transformers/blob/040283170cd559b59b8eb37fe9fe8e99ff7edcbc/src/transformers/trainer_tf.py#L459
I guess I could try and edit that, somehow. Hmm
I've been trying to do things like "color these five experiments one color, color these other five a different color", but then once I maximize the thing the colors all change
It seems to create a folder and put things into it, I was hoping to just observe the tensorboard folder
This seems similar but not quite the thing I'm looking for: https://allegro.ai/clearml/docs/docs/tutorials/tutorial_explicit_reporting.html#step-1-setting-an-output-destination-for-model-checkpoints
It's not a big deal because it happens after I'm done with everything, I can just reset the Colab runtime and start over
So, this is probably a really dumb idea, but can you addRUN source /root.bashrc
or perhaps set the entrypoint?
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
This seems to work:
` from clearml import Logger
for test_metric in posttrain_metrics:
print(test_metric, posttrain_metrics[test_metric])
#report_scalar(title, series, value, iteration)
Logger.current_logger().report_scalar("test", test_metric, posttrain_metrics[test_metric], 0) `
I might not be able to get to that but if you create an issue I'd be happy to link or post what I came up with, wdyt?
Well they do all have different names
As in, I edit Installed Packages, delete everything there, and put that particular list of packages.
Here's the actual script I'm using
And the reason is, because I have a bunch of "runs" with the same settings, and I want to compare broadly across several settings. So if I select "a bunch" with setting A I can see a general pattern when compared with setting B.
I've got 7-10 runs per setting, and about 7 or 8 settings
Ah, makes sense! Have you considered adding a "this is the old website! Click here to get to the new one!" banner, kinda like on docs for python2 functions? https://docs.python.org/2.7/library/string.html
Gave it a try, it seems our GPU Queue doesn't have the S3 creds set up correctly. Making a separate thread about that
So for example:
` {'output_dir': 'shiba_ner_trainer', 'overwrite_output_dir': False, 'do_train': True, 'do_eval': True, 'do_predict': True, 'evaluation_strategy': 'epoch', 'prediction_loss_only': False, 'per_device_train_batch_size': 16, 'per_device_eval_batch_size': 16, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'learning_rate': 0.0004, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam...
OK, I guess
` training_args_dict = training_args.to_dict()
Task.current_task().set_parameters_as_dict(training_args_dict) `works, but how to change the name from "General"?
AgitatedDove14 I'm making some progress on this. I've currently got the situation that my training run saved all of these files, and Task.get_task(param['TaskA']).models['output''][-1]
gets me just one of them, training_args.bin
. Then -2
gets me another, rng_state.pth
If I just get Task.get_task(param['TaskA']).models['output']
, I end up getting a huge list of, like, ` [<clearml.model.Model object at 0x7fec2841c880>, <clearml.model.Model object at 0x7fec2841...
Local in the sense that my team member set it up, remote to me