MelancholyElk85 if you're using add_function_step()
it has a 'docker' parameter. You can read more here:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#add_function_step
MelancholyElk85 , fair point 🙂
How do you initialize your tasks?
So, to summarize:
PipelineController works with default image, but it incurs overhead 4-5 min It doesn't work with any other image
I can add issue on Github
PipelineController works with default image, but it incurs overhead 4-5 min
You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
This error is about failing to clone the pipeline code repo, how is that connected to changing the container ?!
Can you provide the full log?
I found out this happens with any other image except the default one, regardless of whether I set it with pipe._task.set_base_docker
The image is not needed to run the pipeline logic, I do it just to reduce overhead. Otherwise it would take too long to just build the default image on every launch
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
PipelineContoller._task.set_base_docker
?? :good-thinking:
MelancholyElk85 , did PipelineContoller._task.set_base_docker
work?
You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.
With pipe.start(queue='services')
, it still tries to run some docker for some reason1633799714110 kirillfish-ROG-Strix-G512LW-G512LW info ClearML Task: created new task id=a4b0fbc6a1454947a06be4e48eda6740 ClearML results page:
`
1633799714974 kirillfish-ROG-Strix-G512LW-G512LW info ClearML new version available: upgrade to v1.1.2 is recommended!
1633799726152 kirillfish-ROG-Strix-G512LW-G512LW info 2021-10-09 20:15:26,151 - clearml.Task - INFO - Waiting to finish uploads
1633799727482 kirillfish-ROG-Strix-G512LW-G512LW info 2021-10-09 20:15:27,482 - clearml.Task - INFO - Finished uploading
1633799731889 clearml-services INFO task a4b0fbc6a1454947a06be4e48eda6740 pulled from 10cb3fafea4940e8923adad408c23ab4 by worker clearml-services
1633799731967 clearml-services INFO Running Task a4b0fbc6a1454947a06be4e48eda6740 inside default docker: arguments: []
1633799732452 clearml-services INFO Executing: ['docker', 'run', '-t', '-l', 'clearml-worker-id=clearml-services:service:a4b0fbc6a1454947a06be4e48eda6740', '-l', 'clearml-parent-worker-id=clearml-services', '-e', 'NVIDIA_VISIBLE_DEVICES=none', '-e', 'CLEARML_WORKER_ID=clearml-services:service:a4b0fbc6a1454947a06be4e48eda6740', '-e', 'CLEARML_DOCKER_IMAGE=', '-v', '/tmp/.clearml_agent.pgsygoh2.cfg:/root/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.clearml/pip-cache:/root/.cache/pip', '-v', '/root/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/root/.clearml/cache:/clearml_agent_cache', '-v', '/root/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', '', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=none $LOCAL_PYTHON -u -m clearml_agent execute --full-monitoring --id a4b0fbc6a1454947a06be4e48eda6740'] `
This error is about failing to clone the pipeline code repo, how is that connected to changing the container ?!
Can you provide the full log?
I reset this task, will reproduce later
MelancholyElk85 Hi!
You can use a custom image docker if it's on docker hub to reduce overhead. Regarding set_base_docker()
equivalent for PipelineController
, let me check
I launch everything in docker mode, and since it builds an image on every run, it builds default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
image, which incurs heavy overhead. What if I want to give it my custom lightweight image instead? The same way I do for all individual tasks
but the question was about the pipeline controller, not individual tasks.
AgitatedDove14
`
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='git@...', branch='main', commit_id='...', tag='', docker_cmd='registry.gitlab.com/...:...', entry_point='pipe.py', working_dir='.') - Check if remote-worker has valid credentials [see worker configuration file] `
I initialize tasks not as functions, but as scripts from different repositories, with different images
BTW: if you need you can do the following:
` from clearml import Task
from clearml.automation import PipelineController
task = Task.init(project_name='pipelines', task_name='pipeline test')
task.set_base_docker(...)
the pipeline object is using the Current Task, hence docker image is set
pipe = PipelineController(...)
pipe.start() `
of course, I use custom images all the time, the question was how to do it for a pipeline 😆 setting private attributes directly doesn't look as good practice
After I set base docker for pipeline controller task, I cannot clone the repo...
cloning base tasks and modyfing their parameters
With
pipe.start(queue='services')
, it still tries to run some docker for some reason
The services agent is always running with --docker:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docker/services/entrypoint.sh#L16
Actually I think we should have it as an argument, so it is easier to control from docker-compose
I'll be waiting for the full log to check the "git clone" issue