Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have Started To Receive The Following Error Message:

Hi,
I have started to receive the following error message:
"clearml_agent: ERROR: Instance with the same WORKER_ID is already running"
I believe this happens when a process spawns many (tens) of tasks.
What can I do? I need to spawn many tasks...
I'm running clearml version 1.0.4, and it is impossible to update at the moment.
Thanks a lot!
Ron

  
  
Posted 3 years ago
Votes Newest

Answers 15


TimelyPenguin76 SuccessfulKoala55
Do you have any idea what may cause this?
Is it possible that different tasks created together somehow have the same identifier?
Or am I missing something obvious?

  
  
Posted 3 years ago

ArrogantBlackbird16 the file.py is the file contains the Task.init call?
not sure I’m getting the flow, if you just want to create a template task in the system, clone and enqueue it, you can use task.execute_remotely(queue_name="my_queue", clone=True) ,can this solve the issue?

  
  
Posted 3 years ago

I believe there is a single agent, single queue, for all tasks.

  
  
Posted 3 years ago

This is why it is so weird!

  
  
Posted 3 years ago

TimelyPenguin76 ?

  
  
Posted 3 years ago

What is an agent?

  
  
Posted 3 years ago

Hi TimelyPenguin76 ,

Making such a toy example will take a lot of effort.

For now I intend to debug it or circumvent the error with various tricks.

If it is possible to explain the cause of the error message above, or some details regarding it, I would very much appreciate it.

  
  
Posted 3 years ago

ArrogantBlackbird16 can you send a toy example so I can reproduce it my side?

  
  
Posted 3 years ago

Thanks for your help and quick replies.

  
  
Posted 3 years ago

TimelyPenguin76 Thanks for the reply.
I believe the way I start tasks is completely independent to this problem. Assuming my approach is in principle legitimate, it does not explain why I get the following error message. Note that the error only happens when I start multiple tasks. What is the cause of this error?
clearml_agent: ERROR: Instance with the same WORKER_ID [algo-lambda:gpu0] is already running

  
  
Posted 3 years ago

ArrogantBlackbird16 when you say spawn, what exactly do you mean? Also, are you using a locally-hosted server?

  
  
Posted 3 years ago

Hi TimelyPenguin76 and SuccessfulKoala55 ,

My tasks are created by first creating many sub-processes, and then in each sub-process: initializing a task, connecting the task to some parameters, cloning the task, enqueueing the cloned task, then killing the sub-process. When I do this with just a single sub-process, everything seems to work fine. When there are many sub-processes, I get the error message ocassionally.

Yes, I use a locally hosted server (SAIPS team).

  
  
Posted 3 years ago

maybe I missed something here, each process also create an agent to run the task with?

  
  
Posted 3 years ago

To create each subprocess, I use the following:

import subprocess from copy import copy new_env = copy(os.environ) new_env.pop('TRAINS_PROC_MASTER_ID', None) new_env.pop('TRAINS_TASK_ID', None) new_env.pop('CLEARML_PROC_MASTER_ID', None) new_env.pop('CLEARML_TASK_ID', None) subprocess.Popen(cmd, env=new_env, shell=True)
Where cmd is something like "python file.py <parameters>"

Perhaps this somehow disrupts clearml operation in the sub processes?

  
  
Posted 3 years ago

Hi ArrogantBlackbird16 ,

How do you generate and run your tasks? Do you use the same flow as in the https://clear.ml/docs/latest/docs/fundamentals/agents_and_queues#agent-and-queue-workflow ? Some other automation?

  
  
Posted 3 years ago
912 Views
15 Answers
3 years ago
one year ago
Tags