Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Answered

Hi, I have a local package that I use to train my models. To start training, I have a script that calls task._update_requirements([".", "torch==1.11.0"]) .
In my setup.py I declared:
extras_require={"train": ["torch==1.11.0"]}So that I could simply do task._update_requirements(".[train]") but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 10

JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

that would work for pytorch and clearml yes, but what about my local package?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

hey H4dr1en
you just specify the packages that you want to be installed (no need to specify the dependancies) and the version if needed.
Something like :

pytorch==1.10.0

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):

...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /root/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /root/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /root/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /root/.clearml/pip-cache
agent.docker_apt_cache = /root/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.default_python = 3.8
agent.cuda_version = 114
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
...
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
...

Executing task id [1eaf2297b5d04d848bcffc0c6f69cbef]:
...

Python executable with version '3.7' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.8.10) instead
created virtual environment CPython3.8.10.final.0-64 in 358ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.0.4, setuptools==62.1.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Using cached repository in "/root/.clearml/vcs-cache/my-package.git.32940bc4e1fe7ef7cdafd7e48f8cf5db/my-package.git"
From
asd..asd1 support-torch-1.11 -> origin/support-torch-1.11
Note: switching to '...'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c <new-branch-name>

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at asd1 udpate torch to 1.11
type: git
url:
branch: HEAD
commit: ...
root: /root/.clearml/venvs-builds/3.8/task_repository/my-package.git
Applying uncommitted changes

1654604141272 my-agent: DEBUG Collecting pip==20.2.3
Using cached pip-20.2.3-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.4
Uninstalling pip-22.0.4:
Successfully uninstalled pip-22.0.4
Successfully installed pip-20.2.3
Collecting Cython
Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.30
Processing /root/.clearml/venvs-builds/3.8/task_repository/my-package.git

1654604151334 my-agent: DEBUG Collecting torch==1.11.0
Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
...
Collecting clearml~=1.1
Using cached clearml-1.4.1-py2.py3-none-any.whl (794 kB)
...
Building wheels for collected packages: my-package
Building wheel for my-package (setup.py): started
Building wheel for my-package (setup.py): finished with status 'done'
Created wheel for my-package: filename=my-package-2.0.0-py3-none-any.whl size=75530 sha256=7d93799e726038f157f4c3368dabb48d61f0345735872a2182c7972c6417ea90
Stored in directory: /root/.cache/pip/wheels/ec/90/87/b8fe9839dc83b5284f4b7580ad30086619632bd4af172c09cc

Installing collected packages: ..., torch, my-package

Successfully installed my-package-2.0.0 ... torch-1.11.0
Adding venv into cache: /root/.clearml/venvs-builds/3.8
Running task id [1d08bceaf2297b5ffc0c6f4d8469cbef]:
[.]$ /root/.clearml/venvs-builds/3.8/bin/python -u devops/train.py
Summary - installed python packages:
pip:

my-package @ file:///root/.clearml/venvs-builds/3.8/task_repository/my-package-repo.git
...
clearml==1.4.1
...
torch==1.11.0
...

Environment setup completed successfully

Starting Task Execution:
... `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Hi
could you please share the logs for that issue (without the cred 🙂 ) ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

You can force to install only the packages that you need using a requirements.txt file. Type into what you want the agent to install (pytorch and eventually clearml). Then call that function before Task.init :
Task.force_requirements_env_freeze(force=True, requirements_file='path/to/requirements.txt')

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

you can freeze your local env and thus get all the packages installed. With pip (on linux) it would be something like that :
pip freeze > requirements.txt
(doc here https://pip.pypa.io/en/stable/cli/pip_freeze/ )

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Hi JitteryCoyote63

So that I could simply do

task._update_requirements(".[train]")

but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?

The easiest way to go about is to add:
Task.add_requirements("torch", "==1.11.0") task = Task.init(...)Then it will auto detect your custom package, and will always add the torch version. The main issue with relying on the package requirements, is that the "torch" package will not be automatically get listed in the "installed packages" and the agent will not know it needs to resolve it (when pip resolves torch, it will default to the wrong cpu/cuda version)
wdyt?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

10 Answers

2 years ago

one year ago

Answers 10

Sure! Here are the relevant parts:` ...Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):

Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):