Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I’M Running An Agent In K8S Using The Glue. I’M Trying To Set The Setup Shell Script To Install Some Packages Through Apt But I Don’T Think It Even Runs. I Dont See Anything In The Console Logs That Says It Does, At Least. Is There Anyway I Can Debug

Hi,
Iโ€™m running an agent in k8s using the glue. Iโ€™m trying to set the setup shell script to install some packages through apt but I donโ€™t think it even runs. I dont see anything in the console logs that says it does, at least.
Is there anyway I can debug this issue?

  
  
Posted 10 months ago
Votes Newest

Answers 8


Hi @<1523704667563888640:profile|CooperativeOtter46> , are the agents inside the pods running in docker mode?

  
  
Posted 10 months ago

I think so. How can I verify?

  
  
Posted 10 months ago

Can you add a full log of an experiment?

  
  
Posted 10 months ago

1703796129681 clearml-agent-clearml-agent-spot-4-l4 INFO task 2321c65829a640d59044420e601cbc5e pulled from 380ce092ddd342c3aa1e5fdbcc3b1479 by worker clearml-agent-clearml-agent-spot-4-l4
1703796131110 clearml-agent-clearml-agent-spot-4-l4 DEBUG Running kubectl encountered an error: Warning: autopilot-workload-defaulter:Autopilot added tolerations matching: cloud.google.com/gke-spotWarning: autopilot-default-resources-mutator:The max supported TerminationGracePeriodSeconds is 25 seconds when using toleration of cloud.google.com/gke-spot=true:NoSchedule. Defaulting down from configured 30 seconds to 25 seconds.
1703796467352 clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e DEBUG Process failed, exit code 1
1703796467443 clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e DEBUG Using environment access key CLEARML_API_ACCESS_KEY=*********
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.a59wddfv.cfg):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.http.default_method = POST
api.api_server = 

api.web_server = 

api.files_server = 

api.host = 

api.credentials.access_key = ******
agent.worker_id = clearml-agent-clearml-agent-spot-4-l4:2321c65829a640d59044420e601cbc5e
agent.worker_name = clearml-id-2321c65829a640d59044420e601cbc5e
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.default_python = 3.8
agent.cuda_version = 118
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = /workspace/data
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.google.storage.project = trigo-back-office
sdk.google.storage.credentials_json = /etc/gcp-sa-secret-volume/sa_json
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 

sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false

Executing task id [2321c65829a640d59044420e601cbc5e]:
repository = *****
branch = move-to-cloud
version_num = fe944ec30abf0e6fbf176797eeca98265ff6cc7a
tag = 
docker_cmd = nvidia/cuda:11.8.0-runtime-ubuntu20.04
entry_point = ******
working_dir = .

Python executable with version '3.9' requested by the Task, not found in path, using '/usr/bin/python3' (v3.8.10) instead
created virtual environment CPython3.8.10.final.0-64 in 317ms
  creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
  
  
Posted 10 months ago

Looks like it's not running in docker mode ๐Ÿ™‚
Otherwise you'd have the 'docker run' command at the sttart

  
  
Posted 10 months ago

Setup shell script works in docker mode

  
  
Posted 10 months ago

oh ok ๐Ÿ™‚ anyway to something similar?

  
  
Posted 10 months ago

@<1523704667563888640:profile|CooperativeOtter46> , this should be handled by the k8s glue agent spawning the pods - how are to setting up this script?

  
  
Posted 10 months ago
708 Views
8 Answers
10 months ago
10 months ago
Tags