Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

Hey has anyone managed to capture Darts logging with ClearML when using the temporal fusion transformers ? Even when overriding their trainer with a custom Pytorch Lightning Trainer it seems that ClearML cannot retrieve the iteration log...

  
  
Posted one year ago
Votes Newest

Answers 12


Where is darts reporting scalars ?

  
  
Posted one year ago

Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts

  
  
Posted one year ago

Wll idk the scalars are not reported and I get this message ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start , i'll go open the pull request for that right away

  
  
Posted one year ago

No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps to 1 (default 50 ) to prevent the ClearML iteration logger from timing-out

  
  
Posted one year ago

Where do you have your Task.init ?

  
  
Posted one year ago

a bit sad that there is no working integration with one of the leading time series framework...

You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?

  
  
Posted one year ago

None

  
  
Posted one year ago

No I was was pointing out the lack of one

Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget

set the pytorch lightning trainer argument

log_every_n_steps

to

1

(default

50

) to prevent the ClearML iteration logger from timing-out

Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; if you store a checkpoint every iteration and those are happening very quickly) wdyt?

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...

  
  
Posted one year ago

Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?

  
  
Posted one year ago

yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?

  
  
Posted one year ago

The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded

import clearml
from darts.models import TFTModel

model = TFTModel(
    input_chunk_length=28,
    output_chunk_length=14,
    n_epochs=300,
    batch_size=4096,
    add_relative_index=True,
    num_attention_heads=4,
    dropout=0.3,
    full_attention=True,
    save_checkpoints=True,
)

task = Task.init(
    project_name='sales-prediction',
    task_name='TFT Training 2',
    task_type=Task.TaskTypes.training
)

trainer = pl.Trainer(
    max_epochs=300,
    enable_progress_bar=True,
    callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=True), ModelCheckpoint(monitor='val_loss', verbose=True)],
    precision=64,
    accelerator='gpu',
    devices=[0]
)

model.fit(series=train_series, val_series=val_series, trainer=trainer)

task.flush()
task.close()
  
  
Posted one year ago
993 Views
12 Answers
one year ago
one year ago
Tags