Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have A Clearml Experiment That Failed To Load Its Scalar Plots After A Few Hours Of Training, When I Look At The Log Locally With Tensorboard It Seems To Work Fine. Any Idea What'S Going On?

Hi, i have a ClearML experiment that failed to load its scalar plots after a few hours of training, when i look at the log locally with Tensorboard it seems to work fine. Any idea what's going on?
image

  
  
Posted one year ago
Votes Newest

Answers 10


is there a way to retrieve clearml error logs for situations like this?

  
  
Posted one year ago

sorry, not quite sure i understand - i am calling Task.init inside main. my plots loads on clearml correctly for the first few hours or so, but freezes after that

  
  
Posted one year ago

Can you perhaps share a code example of how you code starts and what it imports?

  
  
Posted one year ago

all of the experiments for this particular project behave like this,
the console works fine and im still able to view debug images
Task.init() is called in main of the training script with user-specified project and taskname

  
  
Posted one year ago

I don't think you can connect to a task that was not created using Task.init()

  
  
Posted one year ago

Hi @<1602473359956774912:profile|VivaciousCoyote85> , is this something new, or does all experiments behave this way? Do you see the console logs? Can you share how your code runs Task.init() ?

  
  
Posted one year ago

this is how task gets created:

def create_clearml_task(
    project_name,
    task_name,
    script,
    args,
    docker_args="",
    docker_image_name="<docker image name>",
    add_task_init_call=True,
    requirements_file=None,
    **kwargs):
    print(
        "Creating task: project_name: {project_name}, task_name: {task_name}, script:{script} and args: \n {args}"
        .format(
            project_name=project_name,
            task_name=task_name,
            script=script,
            args=args,
        ))
    arg_tuples = args_to_tuples(args)
    # Remove the argument to execute on clearML before queueing up otherwise we will just keep calling
    # remote execution recursively without ever doing the work.
    unset_clearml_execute(arg_tuples)
    return Task.create(
        argparse_args=arg_tuples,
        project_name=project_name,
        task_name=task_name,
        script=script,
        add_task_init_call=add_task_init_call,
        repo='git@<repo>.git',
        packages=find_current_packages() if requirements_file is None else None,
        requirements_file=requirements_file,
        docker=docker_image_name,
        commit=get_current_commit(),
        docker_bash_setup_script=bash_setup_string,
        docker_args="-v /home:/home -v /data:/data -v /mnt:/mnt -v /etc/aws:/etc/aws --shm-size 50G"
        + docker_args,
        **kwargs)

===============================================

if args.clearml_taskname is not None and args.clearml_execute is not None:
        args_except_execute = {k: v for k, v in vars(args).items() if k != "clearml_execute"}
        task = create_clearml_task(project_name=project_name,
                                   task_name=args.clearml_taskname,
                                   script="train.py",
                                   args=args_except_execute,
                                   docker_image_name=docker_img,
                                   requirements_file=requirements_file,
                                   add_task_init_call=False)
        task.connect(config_dict)
        Task.enqueue(task, queue_name=args.clearml_execute)
        sys.exit(0)

# inside main:
task = Task.init(project_name, clearml_taskname)
task.connect(config_dict)

i import Task from clearml and I also use PyTorch lightning's TensorboardLogger

  
  
Posted one year ago

this is what it said on the console when i tried to load it
image

  
  
Posted one year ago

The easiest thing to do to understand what's going on is to look at you browser's Developer Tools (F12) when trying to load scalars and share the contents of the Network section

  
  
Posted one year ago

i see

  
  
Posted one year ago
950 Views
10 Answers
one year ago
one year ago
Tags