Hi! I Was Wondering Regarding This Issue:

Answered

Hi! I was wondering regarding this issue:
https://github.com/allegroai/clearml/issues/181
The solution seems to be to not log by default, and only use a context when you want to log..but is there a way to the inverse? That is:

log everything
with logger.capture(False):
model.fit

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					MagnificentSeaurchin79
				
					0
					 × 1

Votes Newest

Answers 30

Hi Martin, it is a tqdm parameter (the default ProgressBar in pytorch lightning is unfortunately relying on tqdm). This is from the tqdm docs:
dynamic_ncols : bool, optional If set, constantly altersncolsandnrowsto the environment (allowing for window resizes) [default: False]. nrows : int, optional The screen height. If specified, hides nested bars outside this bound. If unspecified, attempts to use environment height. The fallback is 20. ncols : int, optional The width of the entire output message. If specified, dynamically resizes the progressbar to stay within this bound. If unspecified, attempts to use environment width. The fallback is a meter width of 10 and no limit for the counter and statistics. If 0, will not print any meter (only stats).

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

In case anyone is interested, the minimum effort workaround I found is to edit pytorch_lightning/callbacks/progress.py and change all occurrences of dynamic_ncols=True to dynamic_ncols=False in the calls to tqdm . One could of course implement a custom callback inheriting from their ProgressBar class.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

task = Task.init(project_name='debug', task_name='test tqdm cr cl') print('start') for i in tqdm.tqdm(range(100), dynamic_ncols=True,): sleep(1) print('done')This code snippet works as expected (console will show the progress at the flush interval without values in between). What's the difference ?!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi WittyOwl57 , that is awesome fix! what does "dynamic_ncols" change?
it seems like a tqdm parameter, not sure what it does ...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

But in your console it looks okay ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm let me check ...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

That works fine:
1631895370729 vm-aimbrain-01 info ClearML Task: created new task id=cfed3ea8512d4d9f858d085bd79e62e8 2021-09-17 16:16:10,744 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: 1631895370892 vm-aimbrain-01 info start 1631895370896 vm-aimbrain-01 error 0%| | 0/100 [00:00<?, ?it/s] 1631895471026 vm-aimbrain-01 error 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:40<00:00, 1.00s/it] 1631895471027 vm-aimbrain-01 info done

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

` from time import sleep
from clearml import Task
import tqdm

task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I don't control tqdm, (otherwise I would have already gone for Stef's suggestion) - pytorch-lightning does in this particular script 😞 .

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?

I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?

Also I just tried the pytorch-lightning

RichProgressBar

(not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.

Yey!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I found out that the lightning trainer has a progress_bar_refresh_rate argument (default set to 1) which produces the spamming logs. If I set that to 10, I get 1/10th of the spam (but a janky progress bar in the console). I could set it to 0 to disable it, but that's not really a fix. What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right? 😊

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

Also I just tried the pytorch-lightning RichProgressBar (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

` # Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2

        # ping to the server - check connectivity
        ping_period_sec: 30

        # Log all stdout & stderr
        log_stdout: true

        # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend
        # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds
        console_cr_flush_period: 600

        # compatibility feature, report memory usage for the entire machine
        # default (false), report only on the running process and its sub-processes
        report_global_mem_used: false
    } `(excerpt)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

The UI shows the log as is (and as pasted above). In the console I'm getting correct output (a single tqdm progress line):
[2021-09-17 13:29:51,860][pytorch_lightning.utilities.distributed][INFO] - GPU available: True, used: True [2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - TPU available: False, using: 0 TPU cores [2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - IPU available: False, using: 0 IPUs [2021-09-17 13:29:51,866][pytorch_lightning.accelerators.gpu][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3] Epoch 34: 100%|███████████████████████████████████████████████████████████| 361/361 [02:05<00:00, 2.89it/s, loss=0.151, v_num=9-29]

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

radu on vm-aimbrain-01 in experiments/runners/all via :snake: v3.8.5 via C volt ❯ grep flush ~/clearml.conf # Carriage return (\r) support. If zero (0) \r treated as \n and flushed to backend # Carriage return flush support in seconds, flush consecutive line feeds (\r) every X (default: 10) seconds console_cr_flush_period: 600

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

What's the clearml version? Is this with the latest from GitHub?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Could it be you have two entries of "console_cr_flush_period" ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

This is adapted from one of the methods in their ProgressBar class
from tqdm import tqdm bar = tqdm( desc="Training", initial=1, position=1, disable=False, leave=False, dynamic_ncols=True, file=sys.stderr, smoothing=0) with bar: for i in range(10): time.sleep(0.1) bar.update() print('done')In the console this works as expected, but in a jupyter notebook this produces a scrolling log (because of the position=1 argument, which happens whenever the bar is not the main bar).
If I add the task before the call, the issue is fully reproduced. Finally!
2021-10-07 23:50:06 ClearML Task: created new task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-07 22:50:06,194 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: 2021-10-07 23:50:06 Training: 10%|█████████████████ | 1/10 [00:00<?, ?it/s] A 2021-10-07 23:50:06 ClearML new version available: upgrade to v1.1.2 is recommended! 2021-10-07 23:50:06 Training: 20%|████████████████████████████████▌ | 2/10 [00:00<00:00, 9.95it/s]A Training: 30%|████████████████████████████████████████████████▉ | 3/10 [00:00<00:00, 9.92it/s]A Training: 40%|█████████████████████████████████████████████████████████████████▏ | 4/10 [00:00<00:00, 9.90it/s]A Training: 50%|█████████████████████████████████████████████████████████████████████████████████▌ | 5/10 [00:00<00:00, 9.90it/s]A Training: 60%|█████████████████████████████████████████████████████████████████████████████████████████████████▊ | 6/10 [00:00<00:00, 9.91it/s]A Training: 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 7/10 [00:00<00:00, 9.91it/s]A Training: 80%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 8/10 [00:00<00:00, 9.91it/s]A Training: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 9/10 [00:00<00:00, 9.91it/s]A Training: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9.91it/s]A 2021-10-07 23:50:07 done 2021-10-07 23:50:07 2021-10-07 22:50:07,430 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-07 23:50:07 Training: 11it [00:01, 9.91it/s] 2021-10-07 23:50:12 2021-10-07 22:50:12,286 - clearml.Task - INFO - Finished repository detection and package analysisThis is the full script:

` import sys
from clearml import Task
import time
from tqdm import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl reproduced finally!')

bar = tqdm(
total=10,
desc="Training",
initial=1,
position=1,
disable=False,
leave=True,
dynamic_ncols=True,
file=sys.stderr,
smoothing=0)
with bar:
for i in range(10):
time.sleep(0.1)
bar.update()
print('done') `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

Interesting, I don't get newlines in any of my consoles:
ClearML Task: overwriting (reusing) task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-08 14:57:53,704 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: ClearML new version available: upgrade to v1.1.2 is recommended!███████████████████████████████████████████████████████████▏ | 4/10 [00:00<00:00, 9.80it/s] doneining: 11it [00:01, 9.87it/s] Training: 11it [00:01, 9.87it/s] 2021-10-08 14:57:54,949 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-08 14:57:59,936 - clearml.Task - INFO - Finished repository detection and package analysisor (after install in the new clearml to get rid of the log line):
ClearML Task: overwriting (reusing) task id=38cc10401fcc43cfa432b7ceed7df0cc 2021-10-08 14:59:03,762 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: doneining: 11it [00:01, 9.89it/s] Training: 11it [00:01, 9.89it/s] 2021-10-08 14:59:05,003 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2021-10-08 14:59:09,722 - clearml.Task - INFO - Finished repository detection and package analysis
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks 🙏

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

I might have found it, tqdm is sending
{ 1b 5b 41 } unicode arrow up?https://github.com/horovod/horovod/issues/2367

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Can you reproduce this behavior outside of lightning? or in a toy example (because I could not)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes, the in the console it's okay 😞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

Thanks WittyOwl57 ! let me check

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

WittyOwl57 I can verify the issue reproduces! 🎉 !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

WittyOwl57 this is what I'm getting on my console (Notice New lines! not a single one overwritten as I would expect)
` Training: 10%|█ | 1/10 [00:00<?, ?it/s]
Training: 20%|██ | 2/10 [00:00<00:00, 9.93it/s]
Training: 30%|███ | 3/10 [00:00<00:00, 9.89it/s]
Training: 40%|████ | 4/10 [00:00<00:00, 9.87it/s]
Training: 50%|█████ | 5/10 [00:00<00:00, 9.87it/s]
Training: 60%|██████ | 6/10 [00:00<00:00, 9.88it/s]
Training: 70%|███████ | 7/10 [00:00<00:00, 9.88it/s]
Training: 80%|████████ | 8/10 [00:00<00:00, 9.88it/s]
Training: 90%|█████████ | 9/10 [00:00<00:00, 9.87it/s]
Training: 100%|██████████| 10/10 [00:00<00:00, 9.87it/s]
Training: 11it [00:01, 9.85it/s]
done

Process finished with exit code 0 `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks

I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41 sequence I see on the binary log.
Let me see if I can hack something for you to test 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I mean just add the toy tqdm loop somewhere just before starting the lightning train function. I just want to verify that it works, or maybe there is something in the specific setup happening in real-time that changes it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm been overly optimistic it seems (based on the hints from this issue: https://stackoverflow.com/questions/59455268/how-to-disable-progress-bar-in-pytorch-lightning ). It really looked like it worked. But I was mislead by the fact that the issue has a peculiar pattern to it: it doesn't appear at the beginning of each epoch. There seem to be a couple of extra cr / flushes at the beginning and then a (puzzling) extra "Validation" log line seems to trigger the continuous scrolling. Anyway, my bad,, that doesn't fix things 😢 .

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					WittyOwl57
				
					0
					 × 1

Write your answer

925 Views

30 Answers

3 years ago

one year ago