Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello All, I'M Trying To Figure Out How Can I Log Outputs With Pytorch Lightning. I Used Tensorboard As Clearml Claims To Auto-Capture Tensorboard Outputs, But It Was A No Go.

Hello all, I'm trying to figure out how can I log outputs with Pytorch Lightning. I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.

  
  
Posted 2 years ago
Votes Newest

Answers 9


SweetBadger76 Figured it out. Turns out to be, the issue was caused by a code written in earlier pytorch lightning versions does not work as intended with the current version. This was causing bad tensorboard outputs, or no outputs at all.

  
  
Posted 2 years ago

If you face an issue, can you send me a snippet, so that i could better understand what is happening ? thanks

  
  
Posted 2 years ago

Hi ZanyPig66

I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.

The auto TB logging should work out of the box, where is it failing ?

Also,
task = Task.current_task()Why aren't you using Task.init in the original script?
The idea is that you run your code on your machine (where the environment works), ClearML auto detects code + python packages + args etc.
Then you clone it in the UI and launch it on a remote machine.
What am I missing here?

EDIT:

So, I found creating task from another script with

Task.create

function more convenient. Here is how I create the task from another python file:

Understood, are you saying the auto logging doe snot work when running on the agent ? this seems odd to me, could it be that TB was not installed ? any chance you can provide the log of the execution?

  
  
Posted 2 years ago

That example shows literally nothing than Task.init line, which heavily relies on user employing init function to create task, and clearml being able to capture tensorboard data. However, I'm trying to create a task without running it on local computer. So, I found creating task from another script with Task.create function more convinient. Here is how I create the task from another python file:

` from clearml import Task

task = Task.create(
project_name="training",
task_name="training",
packages=["protobuf==3.20.0"],
docker="mydockerimage",
docker_args="-v /home/username/code:/workspace",
add_task_init_call=True,
script="train.py",
) `

  
  
Posted 2 years ago

This does not capture any logging info. Just system monitors

  
  
Posted 2 years ago

And here is the training script:

` import os
import sys

from torch import Tensor
sys.path.append("/workspace/")

from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TensorBoardLogger

from easyvision.zoo.edsr.model import EDSR
from model import Model
from dataset import TrainingDataset

from clearml import Task, Dataset

def train():
dataset_path = Dataset.get(
dataset_id="bcd566344203462b839a7ba08dd9efa7"
).get_local_copy()

task = Task.current_task()

params = {
    "batch_size": 8,
    "gpus": 1,

    "auto_select_gpu": True
}

params = task.connect(params)

print(params)
    
dataset = SRDataset(
    path=dataset_path
)

model = Model()

detector_sr = PLWrapper({
    "model": model,
    "dataset_train": dataset,
    "dataset_val": dataset,
    "batch_size_train": params.get("batch_size", 4),
    "batch_size_val": params.get("batch_size", 4),
    "num_workers": params.get("batch_size", 4)
})

trainer = Trainer(
    check_val_every_n_epoch=1,
    num_sanity_val_steps=0,
    gpus=int(params.get("gpus", 1)),
    benchmark=True,
    auto_select_gpus=bool(params.get("auto_select_gpu", True)),
    logger=TensorBoardLogger(save_dir="logs")
)
trainer.fit(detector_sr)

if name == "main":
train() `

  
  
Posted 2 years ago

AgitatedDove14 The workflow I'm trying to reach is: developing from the development PC and enqueuing the training pipelines to training server. That's why I employed such workflow. If there is a better practice, or if the thing I was doing is not an intended usecase, I'm open for suggestions.

  
  
Posted 2 years ago

Hi,
ClearML indeed has TensorBoard auto reporting. I suggest you to have a look here, wherre you could find links to some examples : https://clear.ml/docs/latest/docs/fundamentals/logger#automatic-reporting-examples

You could also have a look at the example of pytorch-lightning integration here :
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch-lightning/pytorch_lightning_example.py

  
  
Posted 2 years ago

thanks for those info. i check that and come back to you

  
  
Posted 2 years ago
1K Views
9 Answers
2 years ago
one year ago
Tags