Hi, While Running A Pipeline By Task, The Pipeline Is Stuck In 1St Stage Only. Console Is Showing

Answered

Hi, While running a pipeline by task, the pipeline is stuck in 1st stage only. Console is showing
Configurations:
OrderedDict()
Overrides:
OrderedDict()

and not moving forward. While inspecting queue, it is in pending stage.. can anyone help?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

Votes Newest

Answers 10

I ll explain you what happened, I ran " None " this code, so all the steps of pipeline ran

so the individual part of pipeline ran, but in dashboard when I am seeing the pipeline it is running continuously and not ending even after all the tasks are completed.
the above part is from the console of the pipeline

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

You said the pipeline completed running (one stage? more than one stage?) but I don't see that in the log?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It was stuck here. I had to abort manually. All the tasks completed though.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

1687953244763 si-sajalv:0 ERROR User aborted: stopping task (3)

1687953245766 si-sajalv:0 DEBUG Current configuration (clearml_agent v1.5.2, location: /tmp/clearml.conf):
----------------------
sdk.storage.cache.default_base_dir = /clearml_agent_cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = si-sajalv:0
agent.worker_name = si-sajalv
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.conda_env_as_base_docker = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = /root/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /app/C:/Users/sajal.vasal/.clearml/pip-cache
agent.docker_apt_cache = /app/C:/Users/sajal.vasal/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user = hotshotdragon
agent.default_python = 3.10
agent.cuda_version = 122
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = None
api.web_server = None
api.files_server = None
api.credentials.access_key = ***
api.host = None

Executing task id [080b31ec96124229bc5dc6a1955f58de]:
repository = None
branch = main
version_num = 4f79399df585f502828e8613cfc0688fe7cefada
tag =
docker_cmd = clearml-pipeline:0.4
entry_point = control.py
working_dir = .

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

@<1585078763312386048:profile|ArrogantButterfly10> can you attach the pipeline controller's log?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hey, I seem to have resolve this issue, but stuck in another.
Apparently even after all the tasks got completed of a pipeline, the pipeline is still running, I had to abort it manually. Am I missing any code to stop it after all tasks execution?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

Environment setup completed successfully
Starting Task Execution:
2023-06-28 12:31:52
ClearML results page: None
2023-06-28 12:31:58
ClearML pipeline page: None
2023-06-28 12:32:05
Launching the next 1 steps
Launching step [stage_data]
2023-06-28 12:32:16
Launching step: stage_data
Parameters:
{'General/dataset_url': ' None '}
Configurations:
OrderedDict()
Overrides:
OrderedDict()

This is all I can find from the log, or am I looking at another thing.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

If the step is pending, it basically means nothing takes it from the queue and executes it - look at the agent's log and try and see what's going on (is it monitoring the queue? is it pulling the tasks?)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yes

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ArrogantButterfly10
				
					0
					 × 1

Hi @<1585078763312386048:profile|ArrogantButterfly10> , do you have an agent monitoring the queue into which the pipeline steps are enqueued?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

965 Views

10 Answers

one year ago