Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have Started To Receive The Following Error Message:

Hi,
I have started to receive the following error message:
"clearml_agent: ERROR: Instance with the same WORKER_ID is already running"
I believe this happens when a process spawns many (tens) of tasks.
What can I do? I need to spawn many tasks...
I'm running clearml version 1.0.4, and it is impossible to update at the moment.
Thanks a lot!
Ron

  
  
Posted one year ago
Votes Newest

Answers 15


Hi ArrogantBlackbird16 ,

How do you generate and run your tasks? Do you use the same flow as in the https://clear.ml/docs/latest/docs/fundamentals/agents_and_queues#agent-and-queue-workflow ? Some other automation?

  
  
Posted one year ago

Hi TimelyPenguin76 and SuccessfulKoala55 ,

My tasks are created by first creating many sub-processes, and then in each sub-process: initializing a task, connecting the task to some parameters, cloning the task, enqueueing the cloned task, then killing the sub-process. When I do this with just a single sub-process, everything seems to work fine. When there are many sub-processes, I get the error message ocassionally.

Yes, I use a locally hosted server (SAIPS team).

  
  
Posted one year ago

To create each subprocess, I use the following:

import subprocess from copy import copy new_env = copy(os.environ) new_env.pop('TRAINS_PROC_MASTER_ID', None) new_env.pop('TRAINS_TASK_ID', None) new_env.pop('CLEARML_PROC_MASTER_ID', None) new_env.pop('CLEARML_TASK_ID', None) subprocess.Popen(cmd, env=new_env, shell=True)
Where cmd is something like "python file.py <parameters>"

Perhaps this somehow disrupts clearml operation in the sub processes?

  
  
Posted one year ago

ArrogantBlackbird16 the file.py is the file contains the Task.init call?
not sure I’m getting the flow, if you just want to create a template task in the system, clone and enqueue it, you can use task.execute_remotely(queue_name="my_queue", clone=True) ,can this solve the issue?

  
  
Posted one year ago

TimelyPenguin76 SuccessfulKoala55
Do you have any idea what may cause this?
Is it possible that different tasks created together somehow have the same identifier?
Or am I missing something obvious?

  
  
Posted one year ago

ArrogantBlackbird16 when you say spawn, what exactly do you mean? Also, are you using a locally-hosted server?

  
  
Posted one year ago

Hi TimelyPenguin76 ,

Making such a toy example will take a lot of effort.

For now I intend to debug it or circumvent the error with various tricks.

If it is possible to explain the cause of the error message above, or some details regarding it, I would very much appreciate it.

  
  
Posted one year ago

ArrogantBlackbird16 can you send a toy example so I can reproduce it my side?

  
  
Posted one year ago

This is why it is so weird!

  
  
Posted one year ago

What is an agent?

  
  
Posted one year ago

TimelyPenguin76 ?

  
  
Posted one year ago

Thanks for your help and quick replies.

  
  
Posted one year ago

maybe I missed something here, each process also create an agent to run the task with?

  
  
Posted one year ago

I believe there is a single agent, single queue, for all tasks.

  
  
Posted one year ago

TimelyPenguin76 Thanks for the reply.
I believe the way I start tasks is completely independent to this problem. Assuming my approach is in principle legitimate, it does not explain why I get the following error message. Note that the error only happens when I start multiple tasks. What is the cause of this error?
clearml_agent: ERROR: Instance with the same WORKER_ID [algo-lambda:gpu0] is already running

  
  
Posted one year ago