Wll idk the scalars are not reported and I get this message ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
, i'll go open the pull request for that right away
No I was was pointing out the lack of one
Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget
set the pytorch lightning trainer argument
log_every_n_steps
to
1
(default
50
) to prevent the ClearML iteration logger from timing-out
Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; if you store a checkpoint every iteration and those are happening very quickly) wdyt?
No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps
to 1
(default 50
) to prevent the ClearML iteration logger from timing-out
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
@<1523701205467926528:profile|AgitatedDove14> Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...
yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?
The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded
import clearml
from darts.models import TFTModel
model = TFTModel(
input_chunk_length=28,
output_chunk_length=14,
n_epochs=300,
batch_size=4096,
add_relative_index=True,
num_attention_heads=4,
dropout=0.3,
full_attention=True,
save_checkpoints=True,
)
task = Task.init(
project_name='sales-prediction',
task_name='TFT Training 2',
task_type=Task.TaskTypes.training
)
trainer = pl.Trainer(
max_epochs=300,
enable_progress_bar=True,
callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=True), ModelCheckpoint(monitor='val_loss', verbose=True)],
precision=64,
accelerator='gpu',
devices=[0]
)
model.fit(series=train_series, val_series=val_series, trainer=trainer)
task.flush()
task.close()
Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts
Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?