Hello All, I'M Trying To Figure Out How Can I Log Outputs With Pytorch Lightning. I Used Tensorboard As Clearml Claims To Auto-Capture Tensorboard Outputs, But It Was A No Go.

Answered

Hello all, I'm trying to figure out how can I log outputs with Pytorch Lightning. I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

Votes Newest

Answers 9

This does not capture any logging info. Just system monitors

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

SweetBadger76 Figured it out. Turns out to be, the issue was caused by a code written in earlier pytorch lightning versions does not work as intended with the current version. This was causing bad tensorboard outputs, or no outputs at all.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

If you face an issue, can you send me a snippet, so that i could better understand what is happening ? thanks

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

AgitatedDove14 The workflow I'm trying to reach is: developing from the development PC and enqueuing the training pipelines to training server. That's why I employed such workflow. If there is a better practice, or if the thing I was doing is not an intended usecase, I'm open for suggestions.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

Hi,
ClearML indeed has TensorBoard auto reporting. I suggest you to have a look here, wherre you could find links to some examples : https://clear.ml/docs/latest/docs/fundamentals/logger#automatic-reporting-examples

You could also have a look at the example of pytorch-lightning integration here :
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch-lightning/pytorch_lightning_example.py

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

That example shows literally nothing than Task.init line, which heavily relies on user employing init function to create task, and clearml being able to capture tensorboard data. However, I'm trying to create a task without running it on local computer. So, I found creating task from another script with Task.create function more convinient. Here is how I create the task from another python file:

` from clearml import Task

task = Task.create(
project_name="training",
task_name="training",
packages=["protobuf==3.20.0"],
docker="mydockerimage",
docker_args="-v /home/username/code:/workspace",
add_task_init_call=True,
script="train.py",
) `

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

Hi ZanyPig66

I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.

The auto TB logging should work out of the box, where is it failing ?

Also,
task = Task.current_task()Why aren't you using Task.init in the original script?
The idea is that you run your code on your machine (where the environment works), ClearML auto detects code + python packages + args etc.
Then you clone it in the UI and launch it on a remote machine.
What am I missing here?

EDIT:

So, I found creating task from another script with

Task.create

function more convenient. Here is how I create the task from another python file:

Understood, are you saying the auto logging doe snot work when running on the agent ? this seems odd to me, could it be that TB was not installed ? any chance you can provide the log of the execution?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

thanks for those info. i check that and come back to you

  				
Posted 
	2 years ago

					More  		
  Report
		
					SweetBadger76
				
					0
					 × 1

And here is the training script:

` import os
import sys

from torch import Tensor
sys.path.append("/workspace/")

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TensorBoardLogger

from easyvision.zoo.edsr.model import EDSR
from model import Model
from dataset import TrainingDataset

from clearml import Task, Dataset

def train():
dataset_path = Dataset.get(
dataset_id="bcd566344203462b839a7ba08dd9efa7"
).get_local_copy()

task = Task.current_task()

params = {
    "batch_size": 8,
    "gpus": 1,

    "auto_select_gpu": True
}

params = task.connect(params)

print(params)
    
dataset = SRDataset(
    path=dataset_path
)

model = Model()

detector_sr = PLWrapper({
    "model": model,
    "dataset_train": dataset,
    "dataset_val": dataset,
    "batch_size_train": params.get("batch_size", 4),
    "batch_size_val": params.get("batch_size", 4),
    "num_workers": params.get("batch_size", 4)
})

trainer = Trainer(
    check_val_every_n_epoch=1,
    num_sanity_val_steps=0,
    gpus=int(params.get("gpus", 1)),
    benchmark=True,
    auto_select_gpus=bool(params.get("auto_select_gpu", True)),
    logger=TensorBoardLogger(save_dir="logs")
)
trainer.fit(detector_sr)

if name == "main":
train() `

  				
Posted 
	2 years ago

					More  		
  Report
		
					ZanySeahorse66
				
					0
					 × 1

Write your answer

2K Views

9 Answers

2 years ago