JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze
Hi
could you please share the logs for that issue (without the cred 🙂 ) ?
Hi JitteryCoyote63
So that I could simply do
task._update_requirements(".[train]")
but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?
The easiest way to go about is to add:Task.add_requirements("torch", "==1.11.0") task = Task.init(...)
Then it will auto detect your custom package, and will always add the torch version. The main issue with relying on the package requirements, is that the "torch" package will not be automatically get listed in the "installed packages" and the agent will not know it needs to resolve it (when pip resolves torch, it will default to the wrong cpu/cuda version)
wdyt?
that would work for pytorch and clearml yes, but what about my local package?
You can force to install only the packages that you need using a requirements.txt file. Type into what you want the agent to install (pytorch and eventually clearml). Then call that function before Task.init :Task.force_requirements_env_freeze(force=True, requirements_file='path/to/requirements.txt')
you can freeze your local env and thus get all the packages installed. With pip (on linux) it would be something like that :
pip freeze > requirements.txt
(doc here https://pip.pypa.io/en/stable/cli/pip_freeze/ )
Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):
...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.default_python = 3.8
agent.cuda_version = 114
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
...
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
...
Executing task id [1eaf2297b5d04d848bcffc0c6f69cbef]:
...
Python executable with version '3.7' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.8.10) instead
created virtual environment CPython3.8.10.final.0-64 in 358ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.0.4, setuptools==62.1.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Using cached repository in "/root/.clearml/vcs-cache/my-package.git.32940bc4e1fe7ef7cdafd7e48f8cf5db/my-package.git"
From
asd..asd1 support-torch-1.11 -> origin/support-torch-1.11
Note: switching to '...'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at asd1 udpate torch to 1.11
type: git
url:
branch: HEAD
commit: ...
root: /root/.clearml/venvs-builds/3.8/task_repository/my-package.git
Applying uncommitted changes
1654604141272 my-agent: DEBUG Collecting pip==20.2.3
Using cached pip-20.2.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.4
Uninstalling pip-22.0.4:
Successfully uninstalled pip-22.0.4
Successfully installed pip-20.2.3
Collecting Cython
Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.30
Processing /root/.clearml/venvs-builds/3.8/task_repository/my-package.git
1654604151334 my-agent: DEBUG Collecting torch==1.11.0
Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
...
Collecting clearml~=1.1
Using cached clearml-1.4.1-py2.py3-none-any.whl (794 kB)
...
Building wheels for collected packages: my-package
Building wheel for my-package (setup.py): started
Building wheel for my-package (setup.py): finished with status 'done'
Created wheel for my-package: filename=my-package-2.0.0-py3-none-any.whl size=75530 sha256=7d93799e726038f157f4c3368dabb48d61f0345735872a2182c7972c6417ea90
Stored in directory: /root/.cache/pip/wheels/ec/90/87/b8fe9839dc83b5284f4b7580ad30086619632bd4af172c09cc
Installing collected packages: ..., torch, my-package
Successfully installed my-package-2.0.0 ... torch-1.11.0
Adding venv into cache: /root/.clearml/venvs-builds/3.8
Running task id [1d08bceaf2297b5ffc0c6f4d8469cbef]:
[.]$ /root/.clearml/venvs-builds/3.8/bin/python -u devops/train.py
Summary - installed python packages:
pip:
- my-package @ file:///root/.clearml/venvs-builds/3.8/task_repository/my-package-repo.git
... - clearml==1.4.1
... - torch==1.11.0
...
Environment setup completed successfully
Starting Task Execution:
... `
Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements
, with this approach only the package required will be installed in the agent
hey H4dr1en
you just specify the packages that you want to be installed (no need to specify the dependancies) and the version if needed.
Something like :
pytorch==1.10.0