Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Hi, I have a local package that I use to train my models. To start training, I have a script that calls task._update_requirements([".", "torch==1.11.0"]) .
In my setup.py I declared:
extras_require={"train": ["torch==1.11.0"]}So that I could simply do task._update_requirements(".[train]") but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?

  
  
Posted 2 years ago
Votes Newest

Answers 10


Hi
could you please share the logs for that issue (without the cred 🙂 ) ?

  
  
Posted 2 years ago

Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):

...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.default_python = 3.8
agent.cuda_version = 114
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
...
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
...

Executing task id [1eaf2297b5d04d848bcffc0c6f69cbef]:
...

Python executable with version '3.7' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.8.10) instead
created virtual environment CPython3.8.10.final.0-64 in 358ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.0.4, setuptools==62.1.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Using cached repository in "/root/.clearml/vcs-cache/my-package.git.32940bc4e1fe7ef7cdafd7e48f8cf5db/my-package.git"
From
asd..asd1 support-torch-1.11 -> origin/support-torch-1.11
Note: switching to '...'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c <new-branch-name>

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at asd1 udpate torch to 1.11
type: git
url:
branch: HEAD
commit: ...
root: /root/.clearml/venvs-builds/3.8/task_repository/my-package.git
Applying uncommitted changes

1654604141272 my-agent: DEBUG Collecting pip==20.2.3
Using cached pip-20.2.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.4
Uninstalling pip-22.0.4:
Successfully uninstalled pip-22.0.4
Successfully installed pip-20.2.3
Collecting Cython
Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.30
Processing /root/.clearml/venvs-builds/3.8/task_repository/my-package.git

1654604151334 my-agent: DEBUG Collecting torch==1.11.0
Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
...
Collecting clearml~=1.1
Using cached clearml-1.4.1-py2.py3-none-any.whl (794 kB)
...
Building wheels for collected packages: my-package
Building wheel for my-package (setup.py): started
Building wheel for my-package (setup.py): finished with status 'done'
Created wheel for my-package: filename=my-package-2.0.0-py3-none-any.whl size=75530 sha256=7d93799e726038f157f4c3368dabb48d61f0345735872a2182c7972c6417ea90
Stored in directory: /root/.cache/pip/wheels/ec/90/87/b8fe9839dc83b5284f4b7580ad30086619632bd4af172c09cc

Installing collected packages: ..., torch, my-package

Successfully installed my-package-2.0.0 ... torch-1.11.0
Adding venv into cache: /root/.clearml/venvs-builds/3.8
Running task id [1d08bceaf2297b5ffc0c6f4d8469cbef]:
[.]$ /root/.clearml/venvs-builds/3.8/bin/python -u devops/train.py
Summary - installed python packages:
pip:

Environment setup completed successfully

Starting Task Execution:
... `

  
  
Posted 2 years ago

Hi JitteryCoyote63

So that I could simply do

task._update_requirements(".[train]")

but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?

The easiest way to go about is to add:
Task.add_requirements("torch", "==1.11.0") task = Task.init(...)Then it will auto detect your custom package, and will always add the torch version. The main issue with relying on the package requirements, is that the "torch" package will not be automatically get listed in the "installed packages" and the agent will not know it needs to resolve it (when pip resolves torch, it will default to the wrong cpu/cuda version)
wdyt?

  
  
Posted 2 years ago

Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent

  
  
Posted 2 years ago

You can force to install only the packages that you need using a requirements.txt file. Type into what you want the agent to install (pytorch and eventually clearml). Then call that function before Task.init :
Task.force_requirements_env_freeze(force=True, requirements_file='path/to/requirements.txt')

  
  
Posted 2 years ago

Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?

  
  
Posted 2 years ago

hey H4dr1en
you just specify the packages that you want to be installed (no need to specify the dependancies) and the version if needed.
Something like :

pytorch==1.10.0

  
  
Posted 2 years ago

that would work for pytorch and clearml yes, but what about my local package?

  
  
Posted 2 years ago

you can freeze your local env and thus get all the packages installed. With pip (on linux) it would be something like that :
pip freeze > requirements.txt
(doc here https://pip.pypa.io/en/stable/cli/pip_freeze/ )

  
  
Posted 2 years ago

JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze

  
  
Posted 2 years ago
1K Views
10 Answers
2 years ago
one year ago
Tags
Similar posts