Are There Any Particular System Dependencies Needed To Enable

Answered

Are there any particular system dependencies needed to enable auto_resource_logging ? My understanding based on None is that this should be enabled by default but if I run an example logging script, I am able to see my scalar reported in the UI, but no system stats. I've tried both in my local machine's VSCode devcontainer and using an agent that's running an nvidia CUDA docker image.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Votes Newest

Answers 15

Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

To be clear Task.init() was called initially. I had to call it again later in the code in order to get the current task object instead of Task.current_task() , which only seems to work locally. That's the part that is not intuitive.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

And after having called

Task.init()

the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for

Oh yeah! you always need to call Task.init first, Task,current_task should be called from anywhere you like but after the Task.init was called.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , CLEARML_TASK_ID is set inside the agent's process, which is how I was able to get the task by running Task.get_task(environ["CLEARML_TASK_ID") . However I believe I've sorted out how to make both the resource logging and the tensorboard logging work in the agent. It seems that using Task.current_task() to get the task object does not work when running remotely, but calling Task.init() again does work. And after having called Task.init() the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for None to explain this behavior since it's not very intuitive.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

there is a bug wherein both

Task.current_task()

and

Logger.current_logger()

return

None

.

This is not a bug this means something broke, the environment variable CLEARML_TASK_ID Has to be set inside the agent's process
How are you running it? (also log 🙂 , you can DM so it is not public here)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , I've actually hit on something accidentally that might be a clue. I have noticed that when running inside an agent, there is a bug wherein both Task.current_task() and Logger.current_logger() return None . If these are being used by the clearml package under the hood, this could be the reason we aren't seeing the metrics.

As a workaround, I created this utility function, which works for explicit logging (though it doesn't cause the automatic logging to work):

def get_current_clearml_task() -> Optional[Task]:
    # returns the current task object, if running in ClearML, either with local or
    # remote execution
    if task_id := os.getenv("CLEARML_TASK_ID"):
        return Task.get_task(task_id)
    return Task.current_task()

And then for logging, logger = task.get_logger() .

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Sure. I can send it on Monday. Thank you.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.

This is odd, could you send the full Task log?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , on the resource logging: I tried with a sleep test and it works when I'm running it from my local machine, but when I run remotely in an agent, i do not see resource logging.

And, similarly, with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent. For this, I've decided to just re-write the logging code to use ClearML's built-in logging methods, which work fine in the agent. Would still like to at least get resource logging.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

I still don't get resource logging when I run in an agent.

@<1533620191232004096:profile|NuttyLobster9> there should be no difference ... are we still talking about <30 sec? or a sleep test? (no resource logging at all?)

have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?

This is odd and somewhat consistent with actual no logging? when you are manually reporting scalars, everything works? could it be for some reason the Task ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Correction: it works when I am running the code in my local VSCode session. I still don't get resource logging when I run in an agent. 🤔 . And on a similar topic, I have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Yes, that did make it work in this case, thank you.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Ah interesting, okay. I'll try adding a sleep in here for testing it out. Thanks

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Hi @<1533620191232004096:profile|NuttyLobster9>

I, but no system stats. ,,,

If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Here's my example script:

from random import randint

from clearml import Task

if __name__ == "__main__":
    task: Task = Task.init(
        project_name="clearml-examples", task_name="try-to-make-logging-work"
    )
    task.execute_remotely(queue_name="5da90f42dd4c40edab972a4bef8eab04")
    logger = task.get_logger()
    for i in range(10):
        logger.report_scalar("example plot", series="random", value=randint(0, 100), iteration=i)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyLobster9
				
					0
					 × 1

Write your answer

1K Views

15 Answers

one year ago