Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
(A Regular Experiment Did Execute In The Remote Agent, I Only Get This With The Pipe)

(a regular experiment did execute in the remote agent, I only get this with the pipe)

  
  
Posted 2 years ago
Votes Newest

Answers 23


Can you add a snippet please?

  
  
Posted 2 years ago

So when you run it standalone it works fine? How are you creating the pipeline?

  
  
Posted 2 years ago

I created the pipeline on another machine via interactive python shell. The pipeline is picked up by clearml, as I see it on the web ui.

  
  
Posted 2 years ago

the error occurs in the worker node when it tries to initialize the environment for the pipeline

  
  
Posted 2 years ago

Can you add a larger piece of the error/log? Do you have a code snippet that also reproduces this?

  
  
Posted 2 years ago

if I look at the code of the clearml controller.py, I see that it expects additional code at a relative folder

  
  
Posted 2 years ago

I do not get more information than I just showed

  
  
Posted 2 years ago

if I go to the folder as mentioned in the error and than one level up, I see no other packages present

  
  
Posted 2 years ago

looks like it's missing some dependencies

  
  
Posted 2 years ago

image

  
  
Posted 2 years ago

my worker node is not a docker, but linux in conda environment

  
  
Posted 2 years ago

Can you add the full log & the dependencies detected in original code? How are you building the pipeline?

  
  
Posted 2 years ago

Full console log of the worker:

No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue b5fe1e72614247f7a77e5f6cdac35580
No tasks in Queues, sleeping for 5.0 seconds
task 30ad27a7a1244b6e8aa722d81cb6015c pulled from b5fe1e72614247f7a77e5f6cdac35580 by worker NLEIN-315GNH2:0
Running task '30ad27a7a1244b6e8aa722d81cb6015c'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.sppvun4p.txt', '/tmp/.clearml_agent_out.sppvun4p.txt'
Current configuration (clearml_agent v1.4.1, location: /tmp/.clearml_agent.gss2zozj.cfg):

sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = NLEIN-315GNH2:0
agent.worker_name = NLEIN-315GNH2
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/thermo/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/thermo/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/thermo/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/thermo/.clearml/pip-cache
agent.docker_apt_cache = /home/thermo/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.git_user = MichaelThermo
agent.default_python = 3.8
agent.cuda_version = 0
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = https://api.clear.ml
api.web_server = https://app.clear.ml
api.files_server = https://files.clear.ml
api.credentials.access_key = HCH1PO3TF2EZY0226XUS
api.host = https://api.clear.ml

Executing task id [30ad27a7a1244b6e8aa722d81cb6015c]:
repository =
branch =
version_num =
tag =
docker_cmd =
entry_point = controller.py
working_dir = .

::: Python virtual environment cache is disabled. To accelerate spin-up time set agent.venvs_cache.path=~/.clearml/venvs-cache :::

created virtual environment CPython3.8.0.final.0-64 in 239ms
creator CPython3Posix(dest=/home/thermo/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/thermo/.local/share/virtualenv)
added seed packages: pip==22.3, setuptools==65.5.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.3
Uninstalling pip-22.3:
Successfully uninstalled pip-22.3
Successfully installed pip-20.1.1
Collecting Cython
Using cached Cython-0.29.32-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Collecting attrs==21.4.0
Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB)
Collecting pathlib2==2.3.7.post1
Using cached pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting six==1.16.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting clearml==1.7.2
Using cached clearml-1.7.2-py2.py3-none-any.whl (950 kB)
Collecting pyjwt<2.5.0,>=2.4.0; python_version > "3.5"
Using cached PyJWT-2.4.0-py3-none-any.whl (18 kB)
Collecting jsonschema>=2.6.0
Using cached jsonschema-4.17.0-py3-none-any.whl (83 kB)
Collecting urllib3>=1.21.1
Using cached urllib3-1.26.12-py2.py3-none-any.whl (140 kB)
Collecting psutil>=3.4.2
Using cached psutil-5.9.3-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (295 kB)
Collecting python-dateutil>=2.6.1
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting furl>=2.0.0
Using cached furl-2.1.3-py2.py3-none-any.whl (20 kB)
Collecting pyparsing>=2.0.3
Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting Pillow>=4.1.1
Using cached Pillow-9.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting numpy>=1.10
Using cached numpy-1.23.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
Collecting PyYAML>=3.12
Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Collecting requests>=2.20.0
Using cached requests-2.28.1-py3-none-any.whl (62 kB)
Collecting pkgutil-resolve-name>=1.3.10; python_version < "3.9"
Using cached pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
Using cached pyrsistent-0.19.1-py3-none-any.whl (57 kB)
Collecting importlib-resources>=1.4.0; python_version < "3.9"
Using cached importlib_resources-5.10.0-py3-none-any.whl (34 kB)
Collecting orderedmultidict>=1.0.1
Using cached orderedmultidict-1.0.1-py2.py3-none-any.whl (11 kB)
Collecting charset-normalizer<3,>=2
Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2022.9.24-py3-none-any.whl (161 kB)
Collecting idna<4,>=2.5
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting zipp>=3.1.0; python_version < "3.10"
Using cached zipp-3.10.0-py3-none-any.whl (6.2 kB)
Installing collected packages: attrs, six, pathlib2, pyjwt, pkgutil-resolve-name, pyrsistent, zipp, importlib-resources, jsonschema, urllib3, psutil, python-dateutil, orderedmultidict, furl, pyparsing, Pillow, numpy, PyYAML, charset-normalizer, certifi, idna, requests, clearml
Successfully installed Pillow-9.3.0 PyYAML-6.0 attrs-21.4.0 certifi-2022.9.24 charset-normalizer-2.1.1 clearml-1.7.2 furl-2.1.3 idna-3.4 importlib-resources-5.10.0 jsonschema-4.17.0 numpy-1.23.4 orderedmultidict-1.0.1 pathlib2-2.3.7.post1 pkgutil-resolve-name-1.3.10 psutil-5.9.3 pyjwt-2.4.0 pyparsing-3.0.9 pyrsistent-0.19.1 python-dateutil-2.8.2 requests-2.28.1 six-1.16.0 urllib3-1.26.12 zipp-3.10.0
Adding venv into cache: /home/thermo/.clearml/venvs-builds/3.8
Running task id [30ad27a7a1244b6e8aa722d81cb6015c]:
[.]$ /home/thermo/.clearml/venvs-builds/3.8/bin/python -u /home/thermo/.clearml/venvs-builds/3.8/code/controller.py
Summary - installed python packages:
pip:

  • attrs==21.4.0
  • certifi==2022.9.24
  • charset-normalizer==2.1.1
  • clearml==1.7.2
  • Cython==0.29.32
  • furl==2.1.3
  • idna==3.4
  • importlib-resources==5.10.0
  • jsonschema==4.17.0
  • numpy==1.23.4
  • orderedmultidict==1.0.1
  • pathlib2==2.3.7.post1
  • Pillow==9.3.0
  • pkgutil-resolve-name==1.3.10
  • psutil==5.9.3
  • PyJWT==2.4.0
  • pyparsing==3.0.9
  • pyrsistent==0.19.1
  • python-dateutil==2.8.2
  • PyYAML==6.0
  • requests==2.28.1
  • six==1.16.0
  • urllib3==1.26.12
  • zipp==3.10.0

Environment setup completed successfully

Starting Task Execution:

Traceback (most recent call last):
File "/home/thermo/.clearml/venvs-builds/3.8/code/controller.py", line 20, in <module>
from .job import LocalClearmlJob, RunningJob, BaseJob
ImportError: attempted relative import with no known parent package

Leaving process id 1273
DONE: Running task '30ad27a7a1244b6e8aa722d81cb6015c', exit status 1
Process failed, exit code 1No tasks in queue b5fe1e72614247f7a77e5f6cdac35580
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue b5fe1e72614247f7a77e5f6cdac35580

  
  
Posted 2 years ago

Hi John, I've done more experiments and found that this only happens if you try to run the pipeline remotely directly from the python interpreter

  
  
Posted 2 years ago

however, I did notice another issue.

  
  
Posted 2 years ago

Initially, I had only one queue and one worker set-up. If the pipeline 'default execution queue' is the same as the queue used in pipe.start('the queue'), it gets into sort of a dead-lock and waits forever

  
  
Posted 2 years ago

when I set-up two queues and two workers, set the default-execution-queue to one queue and use the other queue for pipe.start, it all works

  
  
Posted 2 years ago

but the behavior is different if you kick it off from a jupyter notebook (local) or a python script

  
  
Posted 2 years ago

in case of the local jupyter notebook, I create the pipeline and when I start it, it all works without the necessity to add the jupyter notebook to git

  
  
Posted 2 years ago

but if I run exactly the same code from a python script (which also calls start on te pipeline), the worker node tries to check out the script and runs that (or fails if you didn't check it into git yet)

  
  
Posted 2 years ago

The notebook behavior is indeed how I expect it to work, the behavior via the script is strange

  
  
Posted 2 years ago

FYI: this is my pipeline script

from clearml import PipelineController

pipe = PipelineController(name="My Pipe", project="Gridsquare-Training", version="0.0.5")
pipe.add_step(name="pipe step 1", base_task_project="Gridsquare-Training", base_task_name="remo2")
pipe.add_step(name="pipe step 2", base_task_project="Gridsquare-Training", base_task_name="remo2", parents=["pipe step 1"])

pipe.set_default_execution_queue("myqueue")
pipe.start("service")

  
  
Posted 2 years ago

(the 'remo2' task is an existing experiment)

  
  
Posted 2 years ago