Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have Started To Receive The Following Error Message:

Hi,
I have started to receive the following error message:
"clearml_agent: ERROR: Instance with the same WORKER_ID is already running"
I believe this happens when a process spawns many (tens) of tasks.
What can I do? I need to spawn many tasks...
I'm running clearml version 1.0.4, and it is impossible to update at the moment.
Thanks a lot!
Ron

  
  
Posted 2 years ago
Votes Newest

Answers 15


Thanks for your help and quick replies.

  
  
Posted 2 years ago

Hi TimelyPenguin76 ,

Making such a toy example will take a lot of effort.

For now I intend to debug it or circumvent the error with various tricks.

If it is possible to explain the cause of the error message above, or some details regarding it, I would very much appreciate it.

  
  
Posted 2 years ago

ArrogantBlackbird16 can you send a toy example so I can reproduce it my side?

  
  
Posted 2 years ago

TimelyPenguin76 ?

  
  
Posted 2 years ago

This is why it is so weird!

  
  
Posted 2 years ago

I believe there is a single agent, single queue, for all tasks.

  
  
Posted 2 years ago

What is an agent?

  
  
Posted 2 years ago

maybe I missed something here, each process also create an agent to run the task with?

  
  
Posted 2 years ago

TimelyPenguin76 Thanks for the reply.
I believe the way I start tasks is completely independent to this problem. Assuming my approach is in principle legitimate, it does not explain why I get the following error message. Note that the error only happens when I start multiple tasks. What is the cause of this error?
clearml_agent: ERROR: Instance with the same WORKER_ID [algo-lambda:gpu0] is already running

  
  
Posted 2 years ago

ArrogantBlackbird16 the file.py is the file contains the Task.init call?
not sure I’m getting the flow, if you just want to create a template task in the system, clone and enqueue it, you can use task.execute_remotely(queue_name="my_queue", clone=True) ,can this solve the issue?

  
  
Posted 2 years ago

To create each subprocess, I use the following:

import subprocess from copy import copy new_env = copy(os.environ) new_env.pop('TRAINS_PROC_MASTER_ID', None) new_env.pop('TRAINS_TASK_ID', None) new_env.pop('CLEARML_PROC_MASTER_ID', None) new_env.pop('CLEARML_TASK_ID', None) subprocess.Popen(cmd, env=new_env, shell=True)
Where cmd is something like "python file.py <parameters>"

Perhaps this somehow disrupts clearml operation in the sub processes?

  
  
Posted 2 years ago

TimelyPenguin76 SuccessfulKoala55
Do you have any idea what may cause this?
Is it possible that different tasks created together somehow have the same identifier?
Or am I missing something obvious?

  
  
Posted 2 years ago

Hi TimelyPenguin76 and SuccessfulKoala55 ,

My tasks are created by first creating many sub-processes, and then in each sub-process: initializing a task, connecting the task to some parameters, cloning the task, enqueueing the cloned task, then killing the sub-process. When I do this with just a single sub-process, everything seems to work fine. When there are many sub-processes, I get the error message ocassionally.

Yes, I use a locally hosted server (SAIPS team).

  
  
Posted 2 years ago

ArrogantBlackbird16 when you say spawn, what exactly do you mean? Also, are you using a locally-hosted server?

  
  
Posted 2 years ago

Hi ArrogantBlackbird16 ,

How do you generate and run your tasks? Do you use the same flow as in the https://clear.ml/docs/latest/docs/fundamentals/agents_and_queues#agent-and-queue-workflow ? Some other automation?

  
  
Posted 2 years ago
489 Views
15 Answers
2 years ago
one year ago
Tags