Hi @<1523704667563888640:profile|CooperativeOtter46> , are the agents inside the pods running in docker mode?
1703796129681 clearml-agent-clearml-agent-spot-4-l4 INFO task 2321c65829a640d59044420e601cbc5e pulled from 380ce092ddd342c3aa1e5fdbcc3b1479 by worker clearml-agent-clearml-agent-spot-4-l4
1703796131110 clearml-agent-clearml-agent-spot-4-l4 DEBUG Running kubectl encountered an error: Warning: autopilot-workload-defaulter:Autopilot added tolerations matching: cloud.google.com/gke-spotWarning: autopilot-default-resources-mutator:The max supported TerminationGracePeriodSeconds is 25 seconds when using toleration of cloud.google.com/gke-spot=true:NoSchedule. Defaulting down from configured 30 seconds to 25 seconds.
1703796467352 clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e DEBUG Process failed, exit code 1
1703796467443 clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e DEBUG Using environment access key CLEARML_API_ACCESS_KEY=*********
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.a59wddfv.cfg):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.http.default_method = POST
api.api_server =
api.web_server =
api.files_server =
api.host =
api.credentials.access_key = ******
agent.worker_id = clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e
agent.worker_name = clearml-id-2321c65829a640d59044420e601cbc5e
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.disable_task_docker_override = false
agent.default_python = 3.8
agent.cuda_version = 118
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = /workspace/data
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.google.storage.project = trigo-back-office
sdk.google.storage.credentials_json = /etc/gcp-sa-secret-volume/sa_json
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
Executing task id [2321c65829a640d59044420e601cbc5e]:
repository = *****
branch = move-to-cloud
version_num = fe944ec30abf0e6fbf176797eeca98265ff6cc7a
tag =
docker_cmd = nvidia/cuda:11.8.0-runtime-ubuntu20.04
entry_point = ******
working_dir = .
Python executable with version '3.9' requested by the Task, not found in path, using '/usr/bin/python3' (v3.8.10) instead
created virtual environment CPython3.8.10.final.0-64 in 317ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Looks like it's not running in docker mode ๐
Otherwise you'd have the 'docker run' command at the sttart
oh ok ๐ anyway to something similar?
@<1523704667563888640:profile|CooperativeOtter46> , this should be handled by the k8s glue agent spawning the pods - how are to setting up this script?