Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, While Running A Pipeline By Task, The Pipeline Is Stuck In 1St Stage Only. Console Is Showing

Hi, While running a pipeline by task, the pipeline is stuck in 1st stage only. Console is showing
Configurations:
OrderedDict()
Overrides:
OrderedDict()

and not moving forward. While inspecting queue, it is in pending stage.. can anyone help?

  
  
Posted 8 months ago
Votes Newest

Answers 10


Environment setup completed successfully
Starting Task Execution:
2023-06-28 12:31:52
ClearML results page: None
2023-06-28 12:31:58
ClearML pipeline page: None
2023-06-28 12:32:05
Launching the next 1 steps
Launching step [stage_data]
2023-06-28 12:32:16
Launching step: stage_data
Parameters:
{'General/dataset_url': ' None '}
Configurations:
OrderedDict()
Overrides:
OrderedDict()

This is all I can find from the log, or am I looking at another thing.

  
  
Posted 8 months ago

If the step is pending, it basically means nothing takes it from the queue and executes it - look at the agent's log and try and see what's going on (is it monitoring the queue? is it pulling the tasks?)

  
  
Posted 8 months ago

yes

  
  
Posted 8 months ago

Hi @<1585078763312386048:profile|ArrogantButterfly10> , do you have an agent monitoring the queue into which the pipeline steps are enqueued?

  
  
Posted 8 months ago

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

1687953244763 si-sajalv:0 ERROR User aborted: stopping task (3)

1687953245766 si-sajalv:0 DEBUG Current configuration (clearml_agent v1.5.2, location: /tmp/clearml.conf):
----------------------
sdk.storage.cache.default_base_dir = /clearml_agent_cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = si-sajalv:0
agent.worker_name = si-sajalv
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.conda_env_as_base_docker = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = /root/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /app/C:/Users/sajal.vasal/.clearml/pip-cache
agent.docker_apt_cache = /app/C:/Users/sajal.vasal/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user = hotshotdragon
agent.default_python = 3.10
agent.cuda_version = 122
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = None
api.web_server = None
api.files_server = None
api.credentials.access_key = ***
api.host = None

Executing task id [080b31ec96124229bc5dc6a1955f58de]:
repository = None
branch = main
version_num = 4f79399df585f502828e8613cfc0688fe7cefada
tag =
docker_cmd = clearml-pipeline:0.4
entry_point = control.py
working_dir = .

  
  
Posted 8 months ago

@<1585078763312386048:profile|ArrogantButterfly10> can you attach the pipeline controller's log?

  
  
Posted 8 months ago

It was stuck here. I had to abort manually. All the tasks completed though.

  
  
Posted 8 months ago

You said the pipeline completed running (one stage? more than one stage?) but I don't see that in the log?

  
  
Posted 8 months ago

I ll explain you what happened, I ran " None " this code, so all the steps of pipeline ran

so the individual part of pipeline ran, but in dashboard when I am seeing the pipeline it is running continuously and not ending even after all the tasks are completed.
the above part is from the console of the pipeline

  
  
Posted 8 months ago

Hey, I seem to have resolve this issue, but stuck in another.
Apparently even after all the tasks got completed of a pipeline, the pipeline is still running, I had to abort it manually. Am I missing any code to stop it after all tasks execution?

  
  
Posted 8 months ago
420 Views
10 Answers
8 months ago
8 months ago
Tags