but the question was about the pipeline controller, not individual tasks.
So, to summarize:
PipelineController works with default image, but it incurs overhead 4-5 min It doesn't work with any other image
I can add issue on Github
I found out this happens with any other image except the default one, regardless of whether I set it with pipe._task.set_base_docker
The image is not needed to run the pipeline logic, I do it just to reduce overhead. Otherwise it would take too long to just build the default image on every launch
After I set base docker for pipeline controller task, I cannot clone the repo...
AgitatedDove14
`
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='git@...', branch='main', commit_id='...', tag='', docker_cmd='registry.gitlab.com/...:...', entry_point='pipe.py', working_dir='.') - Check if remote-worker has valid credentials [see worker configuration file] `
PipelineController works with default image, but it incurs overhead 4-5 min
You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
This error is about failing to clone the pipeline code repo, how is that connected to changing the container ?!
Can you provide the full log?
With
pipe.start(queue='services')
, it still tries to run some docker for some reason
The services agent is always running with --docker:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docker/services/entrypoint.sh#L16
Actually I think we should have it as an argument, so it is easier to control from docker-compose
I'll be waiting for the full log to check the "git clone" issue
MelancholyElk85 , did PipelineContoller._task.set_base_docker
work?
You can try to spin the "services" queue without docker support, if there is no need for containers it will accelerate the process.
With pipe.start(queue='services')
, it still tries to run some docker for some reason1633799714110 kirillfish-ROG-Strix-G512LW-G512LW info ClearML Task: created new task id=a4b0fbc6a1454947a06be4e48eda6740 ClearML results page:
`
1633799714974 kirillfish-ROG-Strix-G512LW-G512LW info ClearML new version available: upgrade to v1.1.2 is recommended!
1633799726152 kirillfish-ROG-Strix-G512LW-G512LW info 2021-10-09 20:15:26,151 - clearml.Task - INFO - Waiting to finish uploads
1633799727482 kirillfish-ROG-Strix-G512LW-G512LW info 2021-10-09 20:15:27,482 - clearml.Task - INFO - Finished uploading
1633799731889 clearml-services INFO task a4b0fbc6a1454947a06be4e48eda6740 pulled from 10cb3fafea4940e8923adad408c23ab4 by worker clearml-services
1633799731967 clearml-services INFO Running Task a4b0fbc6a1454947a06be4e48eda6740 inside default docker: arguments: []
1633799732452 clearml-services INFO Executing: ['docker', 'run', '-t', '-l', 'clearml-worker-id=clearml-services:service:a4b0fbc6a1454947a06be4e48eda6740', '-l', 'clearml-parent-worker-id=clearml-services', '-e', 'NVIDIA_VISIBLE_DEVICES=none', '-e', 'CLEARML_WORKER_ID=clearml-services:service:a4b0fbc6a1454947a06be4e48eda6740', '-e', 'CLEARML_DOCKER_IMAGE=', '-v', '/tmp/.clearml_agent.pgsygoh2.cfg:/root/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.clearml/pip-cache:/root/.cache/pip', '-v', '/root/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/root/.clearml/cache:/clearml_agent_cache', '-v', '/root/.clearml/vcs-cache:/root/.clearml/vcs-cache', '--rm', '', 'bash', '-c', 'echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update && apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /root/clearml.conf /root/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=none $LOCAL_PYTHON -u -m clearml_agent execute --full-monitoring --id a4b0fbc6a1454947a06be4e48eda6740'] `
This error is about failing to clone the pipeline code repo, how is that connected to changing the container ?!
Can you provide the full log?
I reset this task, will reproduce later
BTW: if you need you can do the following:
` from clearml import Task
from clearml.automation import PipelineController
task = Task.init(project_name='pipelines', task_name='pipeline test')
task.set_base_docker(...)
the pipeline object is using the Current Task, hence docker image is set
pipe = PipelineController(...)
pipe.start() `
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
I launch everything in docker mode, and since it builds an image on every run, it builds default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
image, which incurs heavy overhead. What if I want to give it my custom lightweight image instead? The same way I do for all individual tasks
PipelineContoller._task.set_base_docker
?? :good-thinking:
I initialize tasks not as functions, but as scripts from different repositories, with different images
MelancholyElk85 , fair point 🙂
How do you initialize your tasks?
MelancholyElk85 Hi!
You can use a custom image docker if it's on docker hub to reduce overhead. Regarding set_base_docker()
equivalent for PipelineController
, let me check
MelancholyElk85 if you're using add_function_step()
it has a 'docker' parameter. You can read more here:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#add_function_step
cloning base tasks and modyfing their parameters
of course, I use custom images all the time, the question was how to do it for a pipeline 😆 setting private attributes directly doesn't look as good practice