Hello Everyone, I Currently Have The Following Problem: I Have Rebuilt A Pipeline From The Example And Have Noticed That The Pipeline Is Registered Cleanly On Cearml But Is Not Executed By The Worker. In The Attachment You Can See The Pipeline From The Ex

Answered

Hello everyone, I currently have the following problem: I have rebuilt a pipeline from the example and have noticed that the pipeline is registered cleanly on cearml but is not executed by the worker. in the attachment you can see the pipeline from the example taken from github as well as the logs of the worker and pictures of the platform. Thank you in advance for any ideas 🙂

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Votes Newest

Answers 33

I think the issue is that the language isn't set correctly

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

The agent is now running in WSL and therfore correctly set as shown in the printev command

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

This is the acctual dashboard view

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Click on step_one and on Full details

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

export is a linux command, not windows, is what I mean

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

No tasks in queue da8d82bfb4e540d5bd3de1562ef2d90f
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue da8d82bfb4e540d5bd3de1562ef2d90f
No tasks in Queues, sleeping for 5.0 seconds
task 2bc01ff2f85d40ea9e3b116a247d2837 pulled from da8d82bfb4e540d5bd3de1562ef2d90f by worker DESKTOP-BGDD0C2:0
Running task '2bc01ff2f85d40ea9e3b116a247d2837'
Storing stdout and stderr log to 'C:\Users\Stephan\AppData\Local\Temp.clearml_agent_out.chq2df83.txt', 'C:\Users\Stephan\AppData\Local\Temp.clearml_agent_out.chq2df83.txt'
Current configuration (clearml_agent v1.9.2, location: C:/Users/Stephan/AppData/Local/Temp/.clearml_agent.rb3w9_i_.cfg):

agent.worker_id = DESKTOP-BGDD0C2:0
agent.worker_name = DESKTOP-BGDD0C2
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >= '3.10' and python_version <= '3.11'
agent.package_manager.pip_version.2 = >=23,<24.3 ; python_version >= '3.12'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.pip_legacy_resolver.0 = >=20.3,<24.3
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = nvidia
agent.package_manager.conda_channels.3 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = C:/Users/Stephan/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = C:/Users/Stephan/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = C:/Users/Stephan/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = C:/Users/Stephan/.clearml/pip-cache
agent.docker_apt_cache = C:/Users/Stephan/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.default_docker.match_rules.0.image = python:3.6-bullseye
agent.default_docker.match_rules.0.arguments = --ipc=host
agent.default_docker.match_rules.0.match.script.binary = python3.6$
agent.default_docker.match_rules.1.image = python:3.7-bullseye
agent.default_docker.match_rules.1.arguments = --ipc=host
agent.default_docker.match_rules.1.match.script.binary = python3.7$
agent.default_docker.match_rules.2.image = python:3.8-bullseye
agent.default_docker.match_rules.2.arguments = --ipc=host
agent.default_docker.match_rules.2.match.script.binary = python3.8$
agent.default_docker.match_rules.3.image = python:3.9-bullseye
agent.default_docker.match_rules.3.arguments = --ipc=host
agent.default_docker.match_rules.3.match.script.binary = python3.9$
agent.default_docker.match_rules.4.image = python:3.10-bullseye
agent.default_docker.match_rules.4.arguments = --ipc=host
agent.default_docker.match_rules.4.match.script.binary = python3.10$
agent.default_docker.match_rules.5.image = python:3.11-bullseye
agent.default_docker.match_rules.5.arguments = --ipc=host
agent.default_docker.match_rules.5.match.script.binary = python3.11$
agent.default_docker.match_rules.6.image = python:3.12-bullseye
agent.default_docker.match_rules.6.arguments = --ipc=host
agent.default_docker.match_rules.6.match.script.binary = python3.12$
agent.enable_task_env = false
agent.sanitize_config_printout = ****
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venvs_cache = /root/.clearml/venvs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.git_user =
agent.git_pass = ****
agent.default_python = 3.12
agent.cuda_version = 126
agent.cudnn_version = 0
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.auth.token_expiration_threshold_sec = ****
api.api_server = None
api.web_server = None
api.files_server = None
api.credentials.access_key = FRNLH4XTZ910FA1AVSE0OWTPM0XXUA
api.credentials.secret_key = ****
api.host = None
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.secret = ****
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false

Executing task id [2bc01ff2f85d40ea9e3b116a247d2837]:
repository =
branch =
version_num =
tag =
docker_cmd = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
entry_point = pff.py
working_dir = .

Warning: could not locate requested Python version 3.12, reverting to version 3.12

clearml_agent: ERROR: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byte

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

yes multiple times also changed mode from docker to venve mode does not help. the task always stuck in exicution phase.

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

where exactly would you want to set these in the config ?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

the agent is running on Ubuntu 20.04.5 LTS inside WSL

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

What's the console showing?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

ok I will try

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Now native on Windows python 3.12

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

yes sure just give me a second

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Seems stuck. Can you try restarting the agent? Also how did you run the agent?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Ok but where and how can i investigate ? I use the official example.

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

What windows/processor is on that machine? What version of python are you using?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Are you running a bash terminal in windows?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Can you try running it native on windows?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

If you're running on a windows machine, the same syntax such as export won't work. I'd suggest on checking how to manipulate env variables in windows

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Then add a screenshot of the info section

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

@<1523701070390366208:profile|CostlyOstrich36> this looks like the same behaviour. None

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Hey @<1523701070390366208:profile|CostlyOstrich36> , i have one worker listening ti the default queue, I copied it from the official github None . I asume the pipeline controller is running on the same queue

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

No, under Windows I use Powershell, but this is the integrated WSL terminal

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

This is the console log file

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

@<1774245220934750208:profile|GleamingTiger28> this looks like some non-ascii-character somewhere messing up with Python's UTF8 decoding

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Check the queue, do you have step 1 enqueued?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1774245220934750208:profile|GleamingTiger28> , how many workers do you have listening to the default queue and is the pipeline controller running on the default queue as well?

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

That's not in the ClearML configuration (it's actually not related specifically to ClearML, but to the way Python loads strings) - this should be a Windows Control panel setting (or an env var if you're using WSL)

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Still not working

  				
Posted 
	11 months ago

					More
				  		
  Report
		
					GleamingTiger28
				
					0
					 × 1

Show more results

Write your answer

84K Views

33 Answers

11 months ago