Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

Answered

Hey has anyone managed to capture Darts logging with ClearML when using the temporal fusion transformers ? Even when overriding their trainer with a custom Pytorch Lightning Trainer it seems that ClearML cannot retrieve the iteration log...

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Votes Newest

Answers 12

The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded

import clearml
from darts.models import TFTModel

model = TFTModel(
    input_chunk_length=28,
    output_chunk_length=14,
    n_epochs=300,
    batch_size=4096,
    add_relative_index=True,
    num_attention_heads=4,
    dropout=0.3,
    full_attention=True,
    save_checkpoints=True,
)

task = Task.init(
    project_name='sales-prediction',
    task_name='TFT Training 2',
    task_type=Task.TaskTypes.training
)

trainer = pl.Trainer(
    max_epochs=300,
    enable_progress_bar=True,
    callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=True), ModelCheckpoint(monitor='val_loss', verbose=True)],
    precision=64,
    accelerator='gpu',
    devices=[0]
)

model.fit(series=train_series, val_series=val_series, trainer=trainer)

task.flush()
task.close()

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Where is darts reporting scalars ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

No I was was pointing out the lack of one

Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget

set the pytorch lightning trainer argument

log_every_n_steps

to

1

(default

50

) to prevent the ClearML iteration logger from timing-out

Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; if you store a checkpoint every iteration and those are happening very quickly) wdyt?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi FierceHamster54
I think I'm missing a few details on what is logged, and ref to the git repo?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps to 1 (default 50 ) to prevent the ClearML iteration logger from timing-out

  				
Posted 
	one year ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

AgitatedDove14 Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...

  				
Posted 
	one year ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Where do you have your Task.init ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

None

  				
Posted 
	one year ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Wll idk the scalars are not reported and I get this message ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start , i'll go open the pull request for that right away

  				
Posted 
	one year ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

a bit sad that there is no working integration with one of the leading time series framework...

You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts

  				
Posted 
	2 years ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Write your answer

1K Views

12 Answers

2 years ago

one year ago