I found this issue but not sure if it's the same thing as it's from awhile back: None and also not sure what the final solution here was.
Hi @<1654294828365647872:profile|GorgeousShrimp11> , are you running in docker mode?
Ah, ok thanks so much, I'll work on this now.
From the log it looks like there is no ssh installed on the image:
cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
Does your image have ssh installed? Can you run ssh from inside the container?
I'm just installing the required packages (one of which is an internal package on CodeArtifact).
Looks like it is the issue indeed. You need a docker image that already has ssh
or to install ssh in the image via the shell init script
Sorry, was trying to figure out how to do this. So, looks like I can't as get an error NameError: name 'ssh' is not defined
Yes, I started the agent in docker mode with clearml-agent daemon --queue direct-relief-forecasting --docker directrelief_ml_clearml --cpu-only -d
Ok, so the image needs ssh in order to clone the repo, not just the server.
This is my dockerfile:
# Use a slim Python image as the base image
FROM python:3.9.9-slim as builder
ARG CODEARTIFACT_AUTH_TOKEN
ARG POETRY_HTTP_BASIC_ARTIFACT_USERNAME
ARG POETRY_HTTP_BASIC_ARTIFACT_PASSWORD
# Set environment variables for Poetry
ENV POETRY_HOME=/opt/poetry \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=false \
PATH="/opt/poetry/bin:$PATH"\
PYTHONPATH="/ml_pipeline:${PYTHONPATH}"
# Set the working directory in the container
WORKDIR /src
# Copy only the dependency files
COPY pyproject.toml poetry.lock ./
RUN apt-get update && apt-get upgrade -y \
&& apt-get install --no-install-recommends -y \
curl \
git \
# Install lightGBM as part of retina
# Reference:
&& apt-get install libgomp1 \
# Installing `poetry` package manager:
#
&& curl -sSL '
' | python - \
# Cleaning cache:
&& apt-get remove --purge --auto-remove -y curl \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& apt-get clean -y \
&& poetry install --no-interaction --no-ansi \
&& rm -rf /var/lib/apt/lists/*
I've figured out that this is because I have a config.yaml
file with secrets in it in the repository. This is not committed to git. So, when running remotely, the file is no present. Is the recommendation to put this in the docker image and then I have to specify an entry point in the dockerfile? Previously, I was hoping to just get away with creating a docker image with the installed packages for the agent, not with the repository code as well. Is this not the recommended approach?
Hi @<1523701070390366208:profile|CostlyOstrich36> , I've gotten past the environment set up phase - I have the agent running in --services-mode
for the pipeline, but the pipeline step is failing and I can't see why from the error message. When I run this locally it works (ie. if I add PipelineDecorator.run_locally()
).
Environment setup completed successfully
Starting Task Execution:
1710160150496 direct-relief:cpu:0:service:5062dce7ff1d49cfbaf36c96abe3282c DEBUG ClearML results page:
ClearML pipeline page:
Launching step [direct_relief_pipeline]
Launching step [direct_relief_pipeline]
1710161733002 direct-relief:cpu:0:service:9a0b0e5701684c529a3400512f65236b DEBUG Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.9/task_repository/direct-relief-forecasting.git/main_clearml.py", line 5, in <module>
executing_pipeline(path="src/conf/config.yaml")
File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4459, in internal_decorator
raise triggered_exception
File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4436, in internal_decorator
LazyEvalWrapper.trigger_all_remote_references()
File "/usr/local/lib/python3.9/site-packages/clearml/utilities/proxy_object.py", line 405, in trigger_all_remote_references
func()
File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4095, in results_reference
raise ValueError(
ValueError: Pipeline step "direct_relief_pipeline", Task ID=2893f070b1954f909bb5eb578384dfc7 failed
1710161733030 direct-relief:cpu:0:service:9a0b0e5701684c529a3400512f65236b DEBUG Process failed, exit code 1