do you use docker ? if yes, then you may want to try modifying extra_docker_shell_script in agent config file
SubstantialElk6 is this the pip to install the agent, or the pip the agent is using to install the packages for the specific experiment ?
See here:
https://pip.pypa.io/en/stable/user_guide/#environment-variables
Pass these environment variables as part of the YAML template you are using with the k8s.
Should work for both 🙂
Thanks AgitatedDove14 , unfortunately it didn't take effect.
SubstantialElk6 this is odd, how are they passed ? what's the exact setup ?
i passed it through the yaml as follows.apiVersion: v1 kind: Pod spec: containers: - image: clearml-agent:latest" env: - name: PIP_INDEX_URL value: "
" - name: PIP_TRUSTED_HOST value: "192.168.56.253" - name: PIP_FIND_LINKS value: "
" - name: GIT_SSL_NO_VERIFY value: true resources: requests: cpu: "2" memory: "2Gi" limits: nvidia.com/gpu: 1 restartPolicy: Always
This is the top output of python3 k8s_glue_example.py --queue gpu --overrides-yaml custom.yml --namespace default
Found pod container requests=['nvidia.com/gpu=1'] limits=['memory=2Gi', 'cpu=2'] Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': '
'}, {'name': 'PIP_TRUSTED_HOST', 'value': '192.168.56.253'}, {'name': 'PIP_FIND_LINKS', 'value': '
'}, {'name': 'GIT_SSL_NO_VERIFY', 'value': True}], 'resources': {'requests': {'cpu': '2', 'memory': '2Gi'}, 'limits': {'nvidia.com/gpu': 1}}}] Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf): ----------------------
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
Hmm, I think you should use --template-yaml
What's the diff between template-yaml and --overrides-yaml? I used the latter to ensure the gpu is passed in.
The --template-yaml allows you to use foll k8s YAML template (the overrides is just overrides, which do not include most of the configuration options. we should probably deprecate it
Hi, i changed it, but it still point to https://files.pythonhosted.org/packages .
Can you see that the environment is actually being passed ?
Do you mean this?Removing containers section: [{'image': 'clearml-agent:latest"', 'env': [{'name': 'PIP_INDEX_URL', 'value': '
'},
Hmm yes this is exactly what should not happen 🙂
Let me check it
Hey SubstantialElk6 ,
Can you show us the top output you get when using the template-yaml instead of overrides-yaml?
Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.web_server =
api.api_server =
api.files_server =
api.credentials.access_key = UKATBKH60Z73SJZIXOIW
api.host =
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = true
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id =
agent.worker_name = master-node
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version =
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = true
agent.package_manager.conda_channels.0 = defaults
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = pytorch
agent.package_manager.torch_nightly = false
agent.package_manager.force_repo_requirements_txt = true
agent.package_manager.priority_packages.0 = cython
agent.package_manager.priority_packages.1 = numpy
agent.package_manager.priority_packages.2 = setuptools
agent.venvs_dir = /home/jax/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/jax/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/jax/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = true
agent.docker_pip_cache = /home/jax/.clearml/pip-cache
agent.docker_apt_cache = /home/jax/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.1-runtime-ubuntu18.04
agent.default_docker.arguments.0 = --env GIT_SSL_NO_VERIFY=true
agent.enable_task_env = false
agent.git_user =
agent.default_python = 3.7
agent.cuda_version = 110
agent.cudnn_version = 0
Worker "master-node:0" - Listening to queues:
+----------------------------------+------+-------+
| id | name | tags |
+----------------------------------+------+-------+
| c6f22020435d4fa680e805f530d0078c | gpu | |
+----------------------------------+------+-------+
No tasks in queue c6f22020435d4fa680e805f530d0078c
No tasks in Queues, sleeping for 5.0 seconds `
I'm also noticing a lot of this while the k8s glue is running.Ex: Expecting value: line 1 column 1 (char 0) K8S Glue pods monitor: Failed parsing kubectl output:
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL
and it returned nothing. So the env are not passed to the container at all.