Hello, I'M Following The Tutorial Of

Answered

Hello, I'm following the tutorial of https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/image/hyperparameter_search/ . In my understanding, it would put experiments to queue, and let workers to execute it. However, I don't know how to set up worker. Does it need to install https://github.com/allegroai/clearml-agent ?
Is it possible not to install clearml-agent, but still able to do the Hyperparameter Optimization?
Tahnks.

  				
Posted 
	3 years ago

					More  		
  Report
		
					ScaryBluewhale66
				
					0
					 × 1

Votes Newest

Answers 11

AbruptCow41 Hi, thanks.

I follow this tutorial - https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/image/hyperparameter_search/ , but I didn't see it told me to add any repository.

Also, what I execute as a base experiment is https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb , not image_classification_CIFAR10.py. Does the https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer object need a Python file, not a jupyter notebook as the base experiment? Does it mean that I need to populate another image_classification_CIFAR10.py?

  				
Posted 
	3 years ago

					More  		
  Report
		
					ScaryBluewhale66
				
					0
					 × 1

agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.default_python = 3.8
agent.cuda_version = 114
agent.cudnn_version = 82
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.s3.use_credentials_chain = false
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = https://api.clear.ml
api.web_server = https://app.clear.ml
api.files_server = https://files.clear.ml
api.credentials.access_key = ONEZX58CCBZ406Y8TOGH
api.host = https://api.clear.ml
Executing task id [edb6696a04a44a9c8dcf1ff809a2163c]:
repository = https://github.com/gradient-ai/PyTorch.git
branch = main
version_num = e500ee439cb2876509879952d5e05941c055eb89
tag =
docker_cmd =
entry_point = image_classification_CIFAR10.py
working_dir = ctbc
created virtual environment CPython3.8.12.final.0-64 in 251ms
creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.0.4, setuptools==60.9.3, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Using cached repository in "/root/.clearml/vcs-cache/PyTorch.git.06b588f3f90da68db82fd021c1fba016/PyTorch.git"
Note: switching to 'e500ee439cb2876509879952d5e05941c055eb89'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at e500ee4 Deleted in lieu of new PyTorch File
type: git
url: https://github.com/gradient-ai/PyTorch.git
branch: HEAD
commit: e500ee439cb2876509879952d5e05941c055eb89
root: /root/.clearml/venvs-builds/3.8/task_repository/PyTorch.git
clearml_agent: ERROR: [Errno 2] No such file or directory: '/root/.clearml/venvs-builds/3.8/task_repository/PyTorch.git/ctbc/image_classification_CIFAR10.py'

  				
Posted 
	3 years ago

					More  		
  Report
		
					ScaryBluewhale66
				
					0
					 × 1

CostlyOstrich36 Thanks.
I installed ClearML-Agent to run it. However, I encounter another issue.

It shows the error message of

clearml_agent: ERROR: [Errno 2] No such file or directory: '/root/.clearml/venvs-builds/3.8/task_repository/PyTorch.git/ctbc/image_classification_CIFAR10.py'

I've executed https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb before executing https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/hyperparameter_search.ipynb .

  				
Posted 
	3 years ago

					More  		
  Report
		
					ScaryBluewhale66
				
					0
					 × 1

Hi! What the error is saying is that it is looking for the the ctbc/image_classification_CIFAR10.py file in your repo.
So when you created the task you were inside a git repo, and ClearML assumed that all your files in it were commited and pushed. However your repo https://github.com/gradient-ai/PyTorch.git doesn’t contain these files

  				
Posted 
	3 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

mmm, can you try the following:
create a new folder with no git repo, and copy those two notebooks launch the notebook with the base task and copy the task id launch the notebook with the hyperopt task modifying the TEMPLATE_TASK_ID variable accordingly

  				
Posted 
	3 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

CostlyOstrich36 https://app.clear.ml/projects/aa8fcc0a06b045868b019a551ae073b4/experiments/edb6696a04a44a9c8dcf1ff809a2163c/output/execution

  				
Posted 
	3 years ago

					More  		
  Report
		
					ScaryBluewhale66
				
					0
					 × 1

So you need to push the python files as well

  				
Posted 
	3 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

ScaryBluewhale66 , Hi 🙂

You would need to install ClearML-Agent to run it

  				
Posted 
	3 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Can you provide more of the log before the error as well?

  				
Posted 
	3 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

There’s only a jupyter notebook

  				
Posted 
	3 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

The issue might be already fixed, but I would leave a comment to this issue.

I encountered the same issue at the clone task execution.

I check the log message at the client node which ipynb is executed, the WARNING message was displayed "ClearML Could not read Jupyter Notebook: No template sub-directory with name 'script' found in the following paths ..." during the task creating.

I installed 'nbconvert' in client node and then issue was gone.
It might be better to install 'nbconvert' if you will entry tasks from jupyter notebook.

  				
Posted 
	2 years ago

					More  		
  Report
		
					StraightParrot3
				
					0
					 × 1

Write your answer

1K Views

11 Answers

3 years ago

2 years ago