Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Are There Any Particular System Dependencies Needed To Enable

Are there any particular system dependencies needed to enable auto_resource_logging ? My understanding based on None is that this should be enabled by default but if I run an example logging script, I am able to see my scalar reported in the UI, but no system stats. I've tried both in my local machine's VSCode devcontainer and using an agent that's running an nvidia CUDA docker image.

  
  
Posted 7 months ago
Votes Newest

Answers 15


Correction: it works when I am running the code in my local VSCode session. I still don't get resource logging when I run in an agent. 🤔 . And on a similar topic, I have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?

  
  
Posted 7 months ago

with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.

This is odd, could you send the full Task log?

  
  
Posted 7 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , on the resource logging: I tried with a sleep test and it works when I'm running it from my local machine, but when I run remotely in an agent, i do not see resource logging.

And, similarly, with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent. For this, I've decided to just re-write the logging code to use ClearML's built-in logging methods, which work fine in the agent. Would still like to at least get resource logging.

  
  
Posted 7 months ago

And after having called

Task.init()

the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for

Oh yeah! you always need to call Task.init first, Task,current_task should be called from anywhere you like but after the Task.init was called.

  
  
Posted 7 months ago

I still don't get resource logging when I run in an agent.

@<1533620191232004096:profile|NuttyLobster9> there should be no difference ... are we still talking about <30 sec? or a sleep test? (no resource logging at all?)

have a separate task that is logging metrics with tensorboard. When running locally, I see the metrics appear in the "scalars" tab in ClearML, but when running in an agent, nothing. Any suggestions on where to look?

This is odd and somewhat consistent with actual no logging? when you are manually reporting scalars, everything works? could it be for some reason the Task ?

  
  
Posted 7 months ago

Yes, that did make it work in this case, thank you.

  
  
Posted 7 months ago

Hi @<1533620191232004096:profile|NuttyLobster9>

I, but no system stats. ,,,

If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?

  
  
Posted 7 months ago

Here's my example script:

from random import randint

from clearml import Task

if __name__ == "__main__":
    task: Task = Task.init(
        project_name="clearml-examples", task_name="try-to-make-logging-work"
    )
    task.execute_remotely(queue_name="5da90f42dd4c40edab972a4bef8eab04")
    logger = task.get_logger()
    for i in range(10):
        logger.report_scalar("example plot", series="random", value=randint(0, 100), iteration=i)
  
  
Posted 7 months ago

there is a bug wherein both

Task.current_task()

and

Logger.current_logger()

return

None

.

This is not a bug this means something broke, the environment variable CLEARML_TASK_ID Has to be set inside the agent's process
How are you running it? (also log 🙂 , you can DM so it is not public here)

  
  
Posted 7 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , I've actually hit on something accidentally that might be a clue. I have noticed that when running inside an agent, there is a bug wherein both Task.current_task() and Logger.current_logger() return None . If these are being used by the clearml package under the hood, this could be the reason we aren't seeing the metrics.

As a workaround, I created this utility function, which works for explicit logging (though it doesn't cause the automatic logging to work):

def get_current_clearml_task() -> Optional[Task]:
    # returns the current task object, if running in ClearML, either with local or
    # remote execution
    if task_id := os.getenv("CLEARML_TASK_ID"):
        return Task.get_task(task_id)
    return Task.current_task()

And then for logging, logger = task.get_logger() .

  
  
Posted 7 months ago

To be clear Task.init() was called initially. I had to call it again later in the code in order to get the current task object instead of Task.current_task() , which only seems to work locally. That's the part that is not intuitive.

  
  
Posted 7 months ago

Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?

  
  
Posted 6 months ago

Ah interesting, okay. I'll try adding a sleep in here for testing it out. Thanks

  
  
Posted 7 months ago

Sure. I can send it on Monday. Thank you.

  
  
Posted 7 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , CLEARML_TASK_ID is set inside the agent's process, which is how I was able to get the task by running Task.get_task(environ["CLEARML_TASK_ID") . However I believe I've sorted out how to make both the resource logging and the tensorboard logging work in the agent. It seems that using Task.current_task() to get the task object does not work when running remotely, but calling Task.init() again does work. And after having called Task.init() the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for None to explain this behavior since it's not very intuitive.

  
  
Posted 7 months ago
549 Views
15 Answers
7 months ago
6 months ago
Tags