Hi @<1523701168822292480:profile|ExuberantBat52> ! During local runs, tasks are not run inside the specified Docker container. You need to run your steps remotely. To do this you need to first create a queue, then run a clearml-agent
instance bound to that queue. You also need to specify the queue in add_function_step
. Note that the controller can still be ran locally if you wish to do that
Thank you so much for your reply, will give that a shot!
Hi again @<1523701435869433856:profile|SmugDolphin23> ,
I was able to run the pipeline remotely on an agent, but I am still facing the same problem with the code breaking on the exact same step that requires the docker container. Is there a way to debug what is happening? Currently there is no indication from the logs that it is running the code in the docker container. Here are the docker related logs:
agent.docker_pip_cache = /home/amerii/.clearml/pip-cache
agent.docker_apt_cache = /home/amerii/.clearml/apt-cache.1
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
docker_cmd = 084736541379.dkr.ecr.eu-central-1.amazonaws.com/ap_pipeline:latest
entry_point = pipeline_get_alignments.py
working_dir = .
Here is my pipeline function step:
pipe.add_function_step(
name="align_sequences",
function=pipeline_get_alignments,
function_kwargs={
"X_train": "${map_sequences.X_train}",
"X_test": "${map_sequences.X_test}",
},
function_return=["X_train", "X_test"],
cache_executed_step=True,
tags=["intaRNA"],
docker="aws_account_id.dkr.ecr.eu-central-1.amazonaws.com/ap_pipeline:latest",
docker_bash_setup_script="./docker_setup_script.sh",
packages=packages,
execution_queue=QUEUE,
)
What are some steps I can take to debug what is happening?
Nevermind, I figured out the problem. I needed to specify the --docker
flag when running the clearml-agent