it certainly does not use tensorboard python lib
Hmm, yes I assume this is why the automagic is not working 😞
Does it have a pythonic interface form the metrics ?
callbacks.append( tensorflow.keras.callbacks.TensorBoard( log_dir=str(log_dir), update_freq=tensorboard_config.get("update_freq", "epoch"), ) )
Might be! what's the actual value you are passing there?
BoredHedgehog47 I tried changing the order of imports on the sample code I shared before, it worked in both cases ...
BoredHedgehog47 can you test this one? Is it close to your code ?
Thanks BoredHedgehog47 !
And yes if the Task.init() call was only in main.py
then the TB inside the subprocess (train.py) would as you perceived not be captured.
Did you by any chance test calling Task.init in Both main.py
and train.py
?
Maybe before everything else, can you share some background on the rational if starting a new sub process?
Yes it should
here is fastai example, just in case 🙂
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
NonchalantDeer14
I think the issue is the way it spins the subprocess is not with fork but with Popen, so clearml is not "loaded" into the subprocess hence no logging.
The easiest fix is to call Task.current_task() inside the actual code (somewhere when it starts), it should trigger clearml.
Thanks NonchalantDeer14 !
BTW: how do you submit the multi GPU job? Is it multi-gpu or multi node ?
Okay let me check the code and comeback with followup questions
clearml - WARNING - Could not retrieve remote configuration named 'hyperparams'
What's the clearml-server version you are working with ?
In both logs I see (even in the single GPU log, it seems you "see" two GPUs, is that correct?)GPU 0,1 Tesla V100-SXM2-32GB (arch=7.0)
Last question, this is using relatively old clearml version (0.17.5), can you test with the latest version (1.1.1)?
Just verifying the Pod does get allocated 2 gpus, correct ?
What do you have under the "script path" in the Task?
Well if we the "video" from TB is not in mp4/gif format than someone will have to encode it.
I was just pointing that for the encoding part we might need additional package
Ohh then we can definitely support it, could you maybe post a toy example for testing? Or even better PR it to the examples/tensorboardX folder?
LethalCentipede31 I think seaborn is using matplotlib, it should just work:
https://github.com/allegroai/clearml/blob/6a91374c2dd177b7bdf4c43efca8e6fb0d432648/examples/frameworks/matplotlib/matplotlib_example.py#L48
Hi NonchalantDeer14
In multi-gpu, can you still see the logs on the local Tensorboard ?
Are you running manually or with an agent ?
ReassuredTiger98 do you know if tensorboard (not tensorboardX) also supports gif there ?
ReassuredTiger98 in theory it should work, do you know what is actually stored ? (I mean reencoding it means you have to have opencv / ffmpeg which might be too much to ask)
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
I think the crux of the issue is the subprocess calls I removed.
That kind of makes sense, though if the subprocess function also had Task.init call it should have worked.
Would that be the setup to try to replicate?
Oh that makes sense:
` # Create a child process
using os.fork() method
pid = os.fork()
if pid > 0 :
# pid greater than 0 represents
# the parent process
print("I am parent process:")
print("Process ID:", os.getpid())
print("Child's process ID:", pid)
else :
# pid equal to 0 represents
# the created child process
print("\nI am child process - this is still fully auto logged")
print("Process ID:", os.getpid())
print("Parent's process ID:", o...
I basically moved the Task.init() call below the imports
Okay that is odd, can you copy pate the before/after of the import, so we can fix that?!
Hi ZanyPig66
I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.
The auto TB logging should work out of the box, where is it failing ?
Also,task = Task.current_task()
Why aren't you using Task.init in the original script?
The idea is that you run your code on your machine (where the environment works), ClearML auto detects code + python packages + args etc.
Then you clone it in the UI and launch it on a remote machine.
What am I missing ...
I get the same "white" image in both TB & ClearML 😞