I ll explain you what happened, I ran " None " this code, so all the steps of pipeline ran
so the individual part of pipeline ran, but in dashboard when I am seeing the pipeline it is running continuously and not ending even after all the tasks are completed.
the above part is from the console of the pipeline
You said the pipeline completed running (one stage? more than one stage?) but I don't see that in the log?
It was stuck here. I had to abort manually. All the tasks completed though.
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
1687953244763 si-sajalv:0 ERROR User aborted: stopping task (3)
1687953245766 si-sajalv:0 DEBUG Current configuration (clearml_agent v1.5.2, location: /tmp/clearml.conf):----------------------sdk.storage.cache.default_base_dir = /clearml_agent_cachesdk.storage.cache.size.min_free_bytes = 10GBsdk.storage.direct_access.0.url = file://*sdk.metrics.file_history_size = 100sdk.metrics.matplotlib_untitled_history_size = 100sdk.metrics.images.format = JPEGsdk.metrics.images.quality = 87sdk.metrics.images.subsampling = 0sdk.metrics.tensorboard_single_series_per_graph = falsesdk.network.metrics.file_upload_threads = 4sdk.network.metrics.file_upload_starvation_warning_sec = 120sdk.network.iteration.max_retries_on_server_error = 5sdk.network.iteration.retry_backoff_factor_sec = 10sdk.aws.s3.key =sdk.aws.s3.region =sdk.aws.boto3.pool_connections = 512sdk.aws.boto3.max_multipart_concurrency = 16sdk.log.null_log_propagate = falsesdk.log.task_log_buffer_capacity = 66sdk.log.disable_urllib3_info = truesdk.development.task_reuse_time_window_in_hours = 72.0sdk.development.vcs_repo_detect_async = truesdk.development.store_uncommitted_code_diff = truesdk.development.support_stopping = truesdk.development.default_output_uri =sdk.development.force_analyze_entire_repo = falsesdk.development.suppress_update_message = falsesdk.development.detect_with_pip_freeze = falsesdk.development.worker.report_period_sec = 2sdk.development.worker.ping_period_sec = 30sdk.development.worker.log_stdout = truesdk.development.worker.report_global_mem_used = falseagent.worker_id = si-sajalv:0agent.worker_name = si-sajalvagent.force_git_ssh_protocol = falseagent.python_binary =agent.package_manager.type = pipagent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'agent.package_manager.system_site_packages = trueagent.package_manager.force_upgrade = falseagent.package_manager.conda_channels.0 = pytorchagent.package_manager.conda_channels.1 = conda-forgeagent.package_manager.conda_channels.2 = defaultsagent.package_manager.priority_optional_packages.0 = pygobjectagent.package_manager.torch_nightly = falseagent.package_manager.poetry_files_from_repo_working_dir = falseagent.package_manager.conda_env_as_base_docker = falseagent.venvs_dir = /root/.clearml/venvs-buildsagent.venvs_cache.max_entries = 10agent.venvs_cache.free_space_threshold_gb = 2.0agent.venvs_cache.path = /root/.clearml/venvs-cacheagent.vcs_cache.enabled = trueagent.vcs_cache.path = /root/.clearml/vcs-cacheagent.venv_update.enabled = falseagent.pip_download_cache.enabled = trueagent.pip_download_cache.path = /root/.clearml/pip-download-cacheagent.translate_ssh = trueagent.reload_config = falseagent.docker_pip_cache = /app/C:/Users/sajal.vasal/.clearml/pip-cacheagent.docker_apt_cache = /app/C:/Users/sajal.vasal/.clearml/apt-cacheagent.docker_force_pull = falseagent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04agent.enable_task_env = falseagent.hide_docker_command_env_vars.enabled = trueagent.hide_docker_command_env_vars.parse_embedded_urls = trueagent.abort_callback_max_timeout = 1800agent.docker_internal_mounts.sdk_cache = /clearml_agent_cacheagent.docker_internal_mounts.apt_cache = /var/cache/apt/archivesagent.docker_internal_mounts.ssh_folder = ~/.sshagent.docker_internal_mounts.ssh_ro_folder = /.sshagent.docker_internal_mounts.pip_cache = /root/.cache/pipagent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetryagent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cacheagent.docker_internal_mounts.venv_build = ~/.clearml/venvs-buildsagent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cacheagent.apply_environment = trueagent.apply_files = trueagent.custom_build_script =agent.disable_task_docker_override = falseagent.git_user = hotshotdragonagent.default_python = 3.10agent.cuda_version = 122agent.cudnn_version = 0api.version = 1.5api.verify_certificate = trueapi.default_version = 1.5api.http.max_req_size = 15728640api.http.retries.total = 240api.http.retries.connect = 240api.http.retries.read = 240api.http.retries.redirect = 240api.http.retries.status = 240api.http.retries.backoff_factor = 1.0api.http.retries.backoff_max = 120.0api.http.wait_on_maintenance_forever = trueapi.http.pool_maxsize = 512api.http.pool_connections = 512api.api_server = Noneapi.web_server = Noneapi.files_server = Noneapi.credentials.access_key = ***api.host = None
Executing task id [080b31ec96124229bc5dc6a1955f58de]:repository = Nonebranch = mainversion_num = 4f79399df585f502828e8613cfc0688fe7cefadatag =docker_cmd = clearml-pipeline:0.4entry_point = control.pyworking_dir = .
@<1585078763312386048:profile|ArrogantButterfly10> can you attach the pipeline controller's log?
Hey, I seem to have resolve this issue, but stuck in another.
Apparently even after all the tasks got completed of a pipeline, the pipeline is still running, I had to abort it manually. Am I missing any code to stop it after all tasks execution?
Environment setup completed successfullyStarting Task Execution:2023-06-28 12:31:52ClearML results page: None2023-06-28 12:31:58ClearML pipeline page: None2023-06-28 12:32:05Launching the next 1 stepsLaunching step [stage_data]2023-06-28 12:32:16Launching step: stage_dataParameters:{'General/dataset_url': ' None '}Configurations:OrderedDict()Overrides:OrderedDict()
This is all I can find from the log, or am I looking at another thing.
If the step is pending, it basically means nothing takes it from the queue and executes it - look at the agent's log and try and see what's going on (is it monitoring the queue? is it pulling the tasks?)
Hi @<1585078763312386048:profile|ArrogantButterfly10> , do you have an agent monitoring the queue into which the pipeline steps are enqueued?