Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Anyone Seen This Issue?

Hi, anyone seen this issue?
ERROR: for clearml-agent-services Cannot start service agent-services: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write "a *:* rwm" to "/sys/fsCreating clearml-webserver ... done0430fabab93806fc3494e4329c46f43b2d8d9b2a139de6/devices.allow: operation not permitted: unknown

  
  
Posted 3 years ago
Votes Newest

Answers 30


image

  
  
Posted 3 years ago

No, when I run the pipeline from the console on my local machine, it for some reason launches on clearml-services hostname (despite of the fact I specified the queue with the desired agent with pipe.set_default_execution_queue in my code)

  
  
Posted 3 years ago

Hi MelancholyElk85

However, when I clone the pipeline from web UI and launch it once again, it works. Is there a way to bypass this?

In both cases, are you seeing a different behavior on the same machine running the agent (i.e. clonening from the UI vs code) ?

  
  
Posted 3 years ago

the question remains though: why docker containers won't launch on 

services

Maybe something with the way it launched on the docker-compose?
(I'm assuming it will fail on any docker container regardless, right?!)

  
  
Posted 3 years ago

Yes, it works, thank you! The question remains though: why docker containers won't launch on services

  
  
Posted 3 years ago

yes that works

  
  
Posted 3 years ago

thx

  
  
Posted 3 years ago

I regularly run into the same problem when I launch pipelines locally (for remote execution)

However, when I clone the pipeline from web UI and launch it once again, it works. Is there a way to bypass this?

  
  
Posted 3 years ago

Solved. The problem was a trailing space before the image name in the Image section in web UI. I think you should probably strip the string before proceeding to environment building step, to avoid this annoying stuff to happen. Of course, users could check twice before launching, but this thing will come up every once in a while regardless

  
  
Posted 3 years ago

on the machine I build images? Docker version 20.10.8, build 3967b7d

  
  
Posted 3 years ago

On the machine running the docker-compose (i.e. the clearml-server)

  
  
Posted 3 years ago

What is the solution then? What exactly has helped?

  
  
Posted 3 years ago

1633204289496 clearml-services DEBUG docker: invalid reference format.

This is the strange message, like the execution command is not valid...

  
  
Posted 3 years ago

Yeah.. that should have worked ...
What's the exact error you are getting ?

  
  
Posted 3 years ago

this happpens from the docker compose. i tried to setup with the compose, all other things work - have the Web GUI also….

  
  
Posted 3 years ago

More specifically, there are 2 tasks with almost identical docker commands. The only difference is the image itself. The task with one image works, and with another image it fails. Both images are valid images that lauch nicely on my laptop. Both images exist in the registry. Maybe you have some ideas what could possibly be wrong here?

  
  
Posted 3 years ago

Hmm can you run:
docker run -it allegroai/clearml-agent-services:latest

  
  
Posted 3 years ago

` 1633204284443 clearml-services INFO Executing: ['docker', 'run', '-t', '-l', 'clearml-worker-id=clearml-services:service:58186f9e975f484683a364cf9ce69583', '-l', 'clearml-parent-worker-id=clearml-services', '-e', 'NVIDIA_VISIBLE_DEVICES=none', '-e', 'CLEARML_WORKER_ID=clearml-services:service:58186f9e975f484683a364cf9ce69583', '-e', 'CLEARML_DOCKER_IMAGE=', '-v', '/tmp/.clearml_agent.pgsygoh2.cfg:/root/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.clearml/pip-cache:/root/.cache/pip', '-v', '/root/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/root/.clearml/cache:/clearml_agent_cache', '-v', '/root/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', '', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=none $LOCAL_PYTHON -u -m clearml_agent execute --full-monitoring --id 58186f9e975f484683a364cf9ce69583']

1633204289496 clearml-services DEBUG docker: invalid reference format.
See 'docker run --help'.

1633204289546 clearml-services DEBUG Process failed, exit code 125 `

  
  
Posted 3 years ago

what's the docker version?

  
  
Posted 3 years ago

MelancholyElk85 notice there is the pipeline controller queue (i.e. which agent will run the logic of the pipeline), and the default queue for the pipeline steps (i.e. the actual steps of the pipeline).
The default queue for the pipeline logic itself is services . you can change it ( pipeline.start(..., queue='another_q') )
Make sense ?

  
  
Posted 3 years ago

LazyFox65 seems like a docker issue.
Can you manually run the docker ?

  
  
Posted 3 years ago

Yes, look like it fails on 2 different containers at least

  
  
Posted 3 years ago

ok - good.. now this works…

  
  
Posted 3 years ago

You mean the host where it works correctly? Ubuntu 20.04.3

  
  
Posted 3 years ago

getting somewhere!… 🙂

  
  
Posted 3 years ago

Oh, I need to ask the guy who deployed it

  
  
Posted 3 years ago

AgitatedDove14 I run into this problem again. Are there any known issues about it? I don't remember what helped the last time

  
  
Posted 3 years ago

Yes, I'll try it straightaway

  
  
Posted 3 years ago
953 Views
30 Answers
3 years ago
8 months ago
Tags