Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone! I Have A Question Regarding Running A Script Inside Docker Container With Clearml: I Build An Image Containing All Requirements To Run Some Python Script That Is Getting Arguments Via Argparse. When I Build A Wrapper Script That Will Run This

Hi everyone!
I have a question regarding running a script inside docker container with clearml:
I build an image containing all requirements to run some python script that is getting arguments via argparse.
When I build a wrapper script that will run this container with clearml, it fails on:
clearml_agent: ERROR: Could not install task requirements!
When trying to Pytorch and Packaging, with this exact error message:

 Found PyTorch version torch==1.13.1 matching CUDA version 0
Collecting torch==1.13.1
  ERROR: HTTP error 403 while getting 

  ERROR: Could not install requirement torch==1.13.1 from 
 because of error 403 Client Error: Forbidden for url: 

ERROR: Could not install requirement torch==1.13.1 from 
 because of HTTP error 403 Client Error: Forbidden for url: 
 for URL 

clearml_agent: ERROR: Could not download wheel name of "
"
Requirement already satisfied: PyYAML==6.0 in /usr/local/lib/python3.6/site-packages (from -r /tmp/cached-reqsub0jbcq6.txt (line 1)) (6.0)
Requirement already satisfied: fastnumbers==3.2.1 in /usr/local/lib/python3.6/site-packages (from -r /tmp/cached-reqsub0jbcq6.txt (line 3)) (3.2.1)
Collecting lockfile==0.12.2
  Using cached lockfile-0.12.2-py2.py3-none-any.whl (13 kB)
ERROR: Could not find a version that satisfies the requirement packaging==23.1 (from -r /tmp/cached-reqsub0jbcq6.txt (line 5)) (from versions: 14.0, 14.1, 14.2, 14.3, 14.4, 14.5, 15.0, 15.1, 15.2, 15.3, 16.0, 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 17.0, 17.1, 18.0, 19.0, 19.1, 19.2, 20.0, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8, 20.9, 21.0, 21.1, 21.2, 21.3)
ERROR: No matching distribution found for packaging==23.1 (from -r /tmp/cached-reqsub0jbcq6.txt (line 5))
clearml_agent: ERROR: Could not install task requirements!

I wanted to know why is it trying to install these packages? My image is fully ready to run the required script. It does not need to run on GPU anyway, so there is no need in these installations

  
  
Posted one year ago
Votes Newest

Answers 10


I would suggest structuring everything around the Task object. After you clone and enqueue the agent can handle all the required packages / environment. You can even set environment variables so it won't try to create a new env but use the existing one in the docker container.

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36>
You're right, I do use a custom entry point in my docker file.
So, can you please suggest if you think this would work:

  • Set an environment that will be able to run this task entirely (script will include Task.init).
  • Create a new image from which I will delete the customised run command (FYI that the Dockerfile does not contain clearml/clearml-agent installation)
  • Run the task from python script - will publish task to clearml UI
  • Clone task and use the agent command (as written in my previous message) to spin up the agent that will use the new created docker image to run it.
  
  
Posted one year ago

@<1539417873305309184:profile|DangerousMole43> , I think you're trying to do with the agent something that it wasn't intended to. As @<1523701087100473344:profile|SuccessfulKoala55> mentioned the agent does not support running custom entry points. The idea is to clone tasks in the system and enqueue them where the agent is capable of creating the required environment and running the code through cloning the repo

  
  
Posted one year ago

I can see from the console in the UI that a part of the command it's trying to run is:
'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean'

and some more commands that I'm trying to understand why does my agent gets it. I'm going back and forth on the clearml config but everything I change doesn't seem to have any effect.

  
  
Posted one year ago

Hi @<1539417873305309184:profile|DangerousMole43> , how are you running the agent? By default the agent does not use pre-packaged docker images with a built-in script, the whole concept is for that agent to recreate the correct environment inside the container (hence installing the packages and cloning the code) and re-running your task there

  
  
Posted one year ago

The agent does this automatically - it does not support running your custom entry point

  
  
Posted one year ago

Well done! Out of curiosity, what did you end up doing?

  
  
Posted one year ago

Hi @<1523701087100473344:profile|SuccessfulKoala55> ,
First - I initiate the agent using this command:
clearml-agent daemon --queue maytar_test_q --docker docker_image --detached --cpu-only

As for the task itself - I have a bash script inside the container that executes a python script (also located inside the container), that is getting arguments via argparse (so far, no clearml involved). To initiate the task - I run a very basic python script (out of the container) that initiates a clearml task and gets the same arguments in argparse. Then, once I have the task in the clearml UI (which is completed successfully), I reset it and enqueue it with maytar_test_q . This is where it fails..

  
  
Posted one year ago

Thank you @<1523701070390366208:profile|CostlyOstrich36> and @<1523701087100473344:profile|SuccessfulKoala55> !
I managed to get what I wanted using your inputs!

  
  
Posted one year ago

It's a bit of a problem to do this, as I'm using a subprocess to run a python script in the container, and the paths in my local differ from the one inside the container.

  
  
Posted one year ago