Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi! I Was Wondering Regarding This Issue:

Hi! I was wondering regarding this issue:
https://github.com/allegroai/clearml/issues/181
The solution seems to be to not log by default, and only use a context when you want to log..but is there a way to the inverse? That is:

log everything
with logger.capture(False):
model.fit

  
  
Posted 3 years ago
Votes Newest

Answers 30


The UI shows the log as is (and as pasted above). In the console I'm getting correct output (a single tqdm progress line):
[2021-09-17 13:29:51,860][pytorch_lightning.utilities.distributed][INFO] - GPU available: True, used: True [2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - TPU available: False, using: 0 TPU cores [2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - IPU available: False, using: 0 IPUs [2021-09-17 13:29:51,866][pytorch_lightning.accelerators.gpu][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3] Epoch 34: 100%|███████████████████████████████████████████████████████████| 361/361 [02:05<00:00, 2.89it/s, loss=0.151, v_num=9-29]

  
  
Posted 3 years ago

Thanks WittyOwl57 ! let me check

  
  
Posted 3 years ago

yes, the in the console it's okay 😞

  
  
Posted 3 years ago

` # Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2

        # ping to the server - check connectivity
        ping_period_sec: 30

        # Log all stdout & stderr
        log_stdout: true

        # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
        # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds
        console_cr_flush_period: 600

        # compatibility feature, report memory usage for the entire machine
        # default (false), report only on the running process and its sub-processes
        report_global_mem_used: false
    } `(excerpt)
  
  
Posted 3 years ago

What's the clearml version? Is this with the latest from GitHub?

  
  
Posted 3 years ago

Interesting, I don't get newlines in any of my consoles:
ClearML Task: overwriting (reusing) task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-08 14:57:53,704 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: ClearML new version available: upgrade to v1.1.2 is recommended!███████████████████████████████████████████████████████████▏ | 4/10 [00:00<00:00, 9.80it/s] doneining: 11it [00:01, 9.87it/s] Training: 11it [00:01, 9.87it/s] 2021-10-08 14:57:54,949 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-08 14:57:59,936 - clearml.Task - INFO - Finished repository detection and package analysisor (after install in the new clearml to get rid of the log line):
ClearML Task: overwriting (reusing) task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-08 14:59:03,762 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: doneining: 11it [00:01, 9.89it/s] Training: 11it [00:01, 9.89it/s] 2021-10-08 14:59:05,003 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-08 14:59:09,722 - clearml.Task - INFO - Finished repository detection and package analysis
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks 🙏

  
  
Posted 3 years ago

Could it be you have two entries of "console_cr_flush_period" ?

  
  
Posted 3 years ago

I mean just add the toy tqdm loop somewhere just before starting the lightning train function. I just want to verify that it works, or maybe there is something in the specific setup happening in real-time that changes it

  
  
Posted 3 years ago

I don't control tqdm, (otherwise I would have already gone for Stef's suggestion) - pytorch-lightning does in this particular script 😞 .

  
  
Posted 3 years ago

Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)

  
  
Posted 3 years ago

In case anyone is interested, the minimum effort workaround I found is to edit pytorch_lightning/callbacks/progress.py and change all occurrences of dynamic_ncols=True to dynamic_ncols=False in the calls to tqdm . One could of course implement a custom callback inheriting from their ProgressBar class.

  
  
Posted 3 years ago

Hi WittyOwl57 , that is awesome fix! what does "dynamic_ncols" change?
it seems like a tqdm parameter, not sure what it does ...

  
  
Posted 3 years ago

WittyOwl57 this is what I'm getting on my console (Notice New lines! not a single one overwritten as I would expect)
` Training: 10%|█ | 1/10 [00:00<?, ?it/s]
Training: 20%|██ | 2/10 [00:00<00:00, 9.93it/s]
Training: 30%|███ | 3/10 [00:00<00:00, 9.89it/s]
Training: 40%|████ | 4/10 [00:00<00:00, 9.87it/s]
Training: 50%|█████ | 5/10 [00:00<00:00, 9.87it/s]
Training: 60%|██████ | 6/10 [00:00<00:00, 9.88it/s]
Training: 70%|███████ | 7/10 [00:00<00:00, 9.88it/s]
Training: 80%|████████ | 8/10 [00:00<00:00, 9.88it/s]
Training: 90%|█████████ | 9/10 [00:00<00:00, 9.87it/s]
Training: 100%|██████████| 10/10 [00:00<00:00, 9.87it/s]
Training: 11it [00:01, 9.85it/s]
done

Process finished with exit code 0 `

  
  
Posted 3 years ago

This is adapted from one of the methods in their ProgressBar class
from tqdm import tqdm bar = tqdm( desc="Training", initial=1, position=1, disable=False, leave=False, dynamic_ncols=True, file=sys.stderr, smoothing=0) with bar: for i in range(10): time.sleep(0.1) bar.update() print('done')In the console this works as expected, but in a jupyter notebook this produces a scrolling log (because of the position=1 argument, which happens whenever the bar is not the main bar).
If I add the task before the call, the issue is fully reproduced. Finally!
2021-10-07 23:50:06 ClearML Task: created new task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-07 22:50:06,194 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: 2021-10-07 23:50:06 Training: 10%|█████████████████ | 1/10 [00:00<?, ?it/s] A 2021-10-07 23:50:06 ClearML new version available: upgrade to v1.1.2 is recommended! 2021-10-07 23:50:06 Training: 20%|████████████████████████████████▌ | 2/10 [00:00<00:00, 9.95it/s]A Training: 30%|████████████████████████████████████████████████▉ | 3/10 [00:00<00:00, 9.92it/s]A Training: 40%|█████████████████████████████████████████████████████████████████▏ | 4/10 [00:00<00:00, 9.90it/s]A Training: 50%|█████████████████████████████████████████████████████████████████████████████████▌ | 5/10 [00:00<00:00, 9.90it/s]A Training: 60%|█████████████████████████████████████████████████████████████████████████████████████████████████▊ | 6/10 [00:00<00:00, 9.91it/s]A Training: 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 7/10 [00:00<00:00, 9.91it/s]A Training: 80%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 8/10 [00:00<00:00, 9.91it/s]A Training: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 9/10 [00:00<00:00, 9.91it/s]A Training: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9.91it/s]A 2021-10-07 23:50:07 done 2021-10-07 23:50:07 2021-10-07 22:50:07,430 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-07 23:50:07 Training: 11it [00:01, 9.91it/s] 2021-10-07 23:50:12 2021-10-07 22:50:12,286 - clearml.Task - INFO - Finished repository detection and package analysisThis is the full script:

` import sys
from clearml import Task
import time
from tqdm import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl reproduced finally!')

bar = tqdm(
total=10,
desc="Training",
initial=1,
position=1,
disable=False,
leave=True,
dynamic_ncols=True,
file=sys.stderr,
smoothing=0)
with bar:
for i in range(10):
time.sleep(0.1)
bar.update()
print('done') `

  
  
Posted 3 years ago

I found out that the lightning trainer has a progress_bar_refresh_rate argument (default set to 1) which produces the spamming logs. If I set that to 10, I get 1/10th of the spam (but a janky progress bar in the console). I could set it to 0 to disable it, but that's not really a fix. What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right? 😊

  
  
Posted 3 years ago

Hmm let me check ...

  
  
Posted 3 years ago

yes

  
  
Posted 3 years ago

Hi Martin, it is a tqdm parameter (the default ProgressBar in pytorch lightning is unfortunately relying on tqdm). This is from the tqdm docs:
dynamic_ncols : bool, optional If set, constantly altersncolsandnrowsto the environment (allowing for window resizes) [default: False]. nrows : int, optional The screen height. If specified, hides nested bars outside this bound. If unspecified, attempts to use environment height. The fallback is 20. ncols : int, optional The width of the entire output message. If specified, dynamically resizes the progressbar to stay within this bound. If unspecified, attempts to use environment width. The fallback is a meter width of 10 and no limit for the counter and statistics. If 0, will not print any meter (only stats).

  
  
Posted 3 years ago

radu on vm-aimbrain-01 in experiments/runners/all via :snake: v3.8.5 via C volt ❯ grep flush ~/clearml.conf # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds console_cr_flush_period: 600

  
  
Posted 3 years ago

Can you reproduce this behavior outside of lightning? or in a toy example (because I could not)

  
  
Posted 3 years ago

What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?

I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?

Also I just tried the pytorch-lightning 

RichProgressBar

 (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.

Yey!

  
  
Posted 3 years ago

But in your console it looks okay ?

  
  
Posted 3 years ago

Also I just tried the pytorch-lightning RichProgressBar (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.

  
  
Posted 3 years ago

` from time import sleep
from clearml import Task
import tqdm

task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?

  
  
Posted 3 years ago

I might have found it, tqdm is sending
{ 1b 5b 41 } unicode arrow up?https://github.com/horovod/horovod/issues/2367

  
  
Posted 3 years ago

That works fine:
1631895370729 vm-aimbrain-01 info ClearML Task: created new task id=cfed3ea8512d4d9f858d085bd79e62e8 2021-09-17 16:16:10,744 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: 1631895370892 vm-aimbrain-01 info start 1631895370896 vm-aimbrain-01 error 0%| | 0/100 [00:00<?, ?it/s] 1631895471026 vm-aimbrain-01 error 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:40<00:00, 1.00s/it] 1631895471027 vm-aimbrain-01 info done

  
  
Posted 3 years ago

task = Task.init(project_name='debug', task_name='test tqdm cr cl') print('start') for i in tqdm.tqdm(range(100), dynamic_ncols=True,): sleep(1) print('done')This code snippet works as expected (console will show the progress at the flush interval without values in between). What's the difference ?!

  
  
Posted 3 years ago

I'm been overly optimistic it seems (based on the hints from this issue: https://stackoverflow.com/questions/59455268/how-to-disable-progress-bar-in-pytorch-lightning ). It really looked like it worked. But I was mislead by the fact that the issue has a peculiar pattern to it: it doesn't appear at the beginning of each epoch. There seem to be a couple of extra cr / flushes at the beginning and then a (puzzling) extra "Validation" log line seems to trigger the continuous scrolling. Anyway, my bad,, that doesn't fix things 😢 .

  
  
Posted 3 years ago

In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks

I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41 sequence I see on the binary log.
Let me see if I can hack something for you to test 🙂

  
  
Posted 3 years ago

WittyOwl57 I can verify the issue reproduces! 🎉 !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it 🙂

  
  
Posted 3 years ago