Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

Answered

Hey has anyone managed to capture Darts logging with ClearML when using the temporal fusion transformers ? Even when overriding their trainer with a custom Pytorch Lightning Trainer it seems that ClearML cannot retrieve the iteration log...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Votes Newest

Answers 12

No I was was pointing out the lack of one

Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget

set the pytorch lightning trainer argument

log_every_n_steps

to

1

(default

50

) to prevent the ClearML iteration logger from timing-out

Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; if you store a checkpoint every iteration and those are happening very quickly) wdyt?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Where is darts reporting scalars ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

a bit sad that there is no working integration with one of the leading time series framework...

You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded

import clearml
from darts.models import TFTModel

model = TFTModel(
    input_chunk_length=28,
    output_chunk_length=14,
    n_epochs=300,
    batch_size=4096,
    add_relative_index=True,
    num_attention_heads=4,
    dropout=0.3,
    full_attention=True,
    save_checkpoints=True,
)

task = Task.init(
    project_name='sales-prediction',
    task_name='TFT Training 2',
    task_type=Task.TaskTypes.training
)

trainer = pl.Trainer(
    max_epochs=300,
    enable_progress_bar=True,
    callbacks=[EarlyStopping(monitor='val_loss', patience=5, verbose=True), ModelCheckpoint(monitor='val_loss', verbose=True)],
    precision=64,
    accelerator='gpu',
    devices=[0]
)

model.fit(series=train_series, val_series=val_series, trainer=trainer)

task.flush()
task.close()

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps to 1 (default 50 ) to prevent the ClearML iteration logger from timing-out

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Wll idk the scalars are not reported and I get this message ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start , i'll go open the pull request for that right away

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

None

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Where do you have your Task.init ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

12 Answers

2 years ago