Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There, We Have A Clearml Server And I Have Spun Up An Agent Using A Docker Image (As We Have Private Packages On Aws Codeartifact That Need Installing). I Have Created The

Hi there, we have a ClearML server and i have spun up an agent using a docker image (as we have private packages on AWS CodeArtifact that need installing). I have created the clearml.conf file and tried various things, but i cannot get a ClearML Pipeline to run as it cannot clone the repo via ssh .

This is the error message (removed repo names, etc for privacy)

created virtual environment CPython3.9.9.final.0-64 in 661ms
  creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.9, clear=False, no_vcs_ignore=False, global=True)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'git@bitbucket.org:pendulum-systems-inc/repo.git', '/root/.clearml/vcs-cache/repo.git.92e80863adb216af60f1b9a6015fe061/repo.git', '--recursive', '--quiet']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@bitbucket.org:pendulum-systems-inc/repo.git', branch='feature/branch', commit_id='f359e2578b340edecc46baf2fbd183366e753e34', tag='', docker_cmd='docker_imgage', entry_point='clearml_pipeline/pipeline.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]
  • I have tried setting force_git_ssh_protocol: true and have added ssh keys (I can clone this repo and branch outside of ClearML to the server).
  • I have tried setting the above to false and adding a git_user and git_pass but get the exact same error message above (the url also does not change to https... )
  • I have cleared the vcs-cache.
  
  
Posted one month ago
Votes Newest

Answers 12


Hi @<1654294828365647872:profile|GorgeousShrimp11> , are you running in docker mode?

  
  
Posted one month ago

Yes, I started the agent in docker mode with clearml-agent daemon --queue direct-relief-forecasting --docker directrelief_ml_clearml --cpu-only -d

  
  
Posted one month ago

Ok, so the image needs ssh in order to clone the repo, not just the server.

This is my dockerfile:

# Use a slim Python image as the base image
FROM python:3.9.9-slim as builder

ARG CODEARTIFACT_AUTH_TOKEN
ARG POETRY_HTTP_BASIC_ARTIFACT_USERNAME
ARG POETRY_HTTP_BASIC_ARTIFACT_PASSWORD

# Set environment variables for Poetry
ENV POETRY_HOME=/opt/poetry \
    POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_CREATE=false \
    PATH="/opt/poetry/bin:$PATH"\
    PYTHONPATH="/ml_pipeline:${PYTHONPATH}"

# Set the working directory in the container
WORKDIR /src

# Copy only the dependency files
COPY pyproject.toml poetry.lock ./

RUN apt-get update && apt-get upgrade -y \
    && apt-get install --no-install-recommends -y \
    curl \
    git \
    # Install lightGBM as part of retina
    # Reference: 

    && apt-get install libgomp1 \
    # Installing `poetry` package manager:
    # 

    && curl -sSL '
' | python - \
    # Cleaning cache:
    && apt-get remove --purge --auto-remove -y curl \
    && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
    && apt-get clean -y \
    && poetry install --no-interaction --no-ansi \
    && rm -rf /var/lib/apt/lists/*
  
  
Posted one month ago

From the log it looks like there is no ssh installed on the image:

cloning: git@bitbucket.org:pendulum-systems-inc/repo.git
ssh -oBatchMode=yes: 1: ssh: not found
fatal: Could not read from remote repository.
  
  
Posted one month ago

I'm just installing the required packages (one of which is an internal package on CodeArtifact).

  
  
Posted one month ago

Sorry, was trying to figure out how to do this. So, looks like I can't as get an error NameError: name 'ssh' is not defined

  
  
Posted one month ago

Does your image have ssh installed? Can you run ssh from inside the container?

  
  
Posted one month ago

Ah, ok thanks so much, I'll work on this now.

  
  
Posted one month ago

Hi @<1523701070390366208:profile|CostlyOstrich36> , I've gotten past the environment set up phase - I have the agent running in --services-mode for the pipeline, but the pipeline step is failing and I can't see why from the error message. When I run this locally it works (ie. if I add PipelineDecorator.run_locally() ).

Environment setup completed successfully

Starting Task Execution:


1710160150496 direct-relief:cpu:0:service:5062dce7ff1d49cfbaf36c96abe3282c DEBUG ClearML results page: 

ClearML pipeline page: 

Launching step [direct_relief_pipeline]

Launching step [direct_relief_pipeline]

1710161733002 direct-relief:cpu:0:service:9a0b0e5701684c529a3400512f65236b DEBUG Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.9/task_repository/direct-relief-forecasting.git/main_clearml.py", line 5, in <module>
    executing_pipeline(path="src/conf/config.yaml")
  File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4459, in internal_decorator
    raise triggered_exception
  File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4436, in internal_decorator
    LazyEvalWrapper.trigger_all_remote_references()
  File "/usr/local/lib/python3.9/site-packages/clearml/utilities/proxy_object.py", line 405, in trigger_all_remote_references
    func()
  File "/usr/local/lib/python3.9/site-packages/clearml/automation/controller.py", line 4095, in results_reference
    raise ValueError(
ValueError: Pipeline step "direct_relief_pipeline", Task ID=2893f070b1954f909bb5eb578384dfc7 failed

1710161733030 direct-relief:cpu:0:service:9a0b0e5701684c529a3400512f65236b DEBUG Process failed, exit code 1
  
  
Posted one month ago

I've figured out that this is because I have a config.yaml file with secrets in it in the repository. This is not committed to git. So, when running remotely, the file is no present. Is the recommendation to put this in the docker image and then I have to specify an entry point in the dockerfile? Previously, I was hoping to just get away with creating a docker image with the installed packages for the agent, not with the repository code as well. Is this not the recommended approach?

  
  
Posted one month ago

I found this issue but not sure if it's the same thing as it's from awhile back: None and also not sure what the final solution here was.

  
  
Posted one month ago

Looks like it is the issue indeed. You need a docker image that already has ssh or to install ssh in the image via the shell init script

  
  
Posted one month ago
134 Views
12 Answers
one month ago
one month ago
Tags
Similar posts