Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts
No I was was pointing out the lack of one
Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget
set the pytorch lightning trainer argument
log_every_n_steps
to
1
(default
50
) to prevent the ClearML iteration logger from timing-out
Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; if you store a checkpoint every iteration and those are happening very quickly) wdyt?
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?
Wll idk the scalars are not reported and I get this message ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
, i'll go open the pull request for that right away
No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps
to 1
(default 50
) to prevent the ClearML iteration logger from timing-out
The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded
import clearml
from darts.models import TFTModel
model = TFTModel(
input_chunk_length=28,
output_chunk_length=14,
n_epochs=300,
batch_size=4096,
add_relative_index=True,
num_attention_heads=4,
dropout=0.3,
full_attention=True,
save_checkpoints=True,
)
task = Task.init(
project_name='sales-prediction',
task_name='TFT Training 2',
task_type=Task.TaskTypes.training
)
trainer = pl.Trainer(
max_epochs=300,
enable_progress_bar=True,
callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=True), ModelCheckpoint(monitor='val_loss', verbose=True)],
precision=64,
accelerator='gpu',
devices=[0]
)
model.fit(series=train_series, val_series=val_series, trainer=trainer)
task.flush()
task.close()
Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?
@<1523701205467926528:profile|AgitatedDove14> Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...