Hi All! I Want To Run My Task Remotely On An Agent, But I'M Having Trouble With The Requirements Setup. I Have A

Answered

Hi all!

I want to run my task remotely on an agent, but I'm having trouble with the requirements setup.

I have a requirements.txt with many packages to install, and the last line is " . " (which means 'install my package from this repo').
My package install involves a torch.utils.cpp_extension.CUDAExtension called, lets say, 'cuda_ext'

But it seems the agent is not correctly installing my packages.

The logs show "Successfully built package" and "Successfully installed package"; but then "Summary - instaled python packages : ...." does not show my package on the list
The task then fails due to an ImportError: 'cuda_ext' not found

Any tips? I've spent a lot more time than I would like on this 😞

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Votes Newest

Answers 14

Thanks for the help anyway!

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

And here is the repo: None

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Can't you do that in the docker bash script?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Also, if you check the logs my package is actually built at step 4:

2023-05-03 10:07:58
Building wheels for collected packages: softgroup
  Building wheel for softgroup (setup.py) ... ?25l-
2023-05-03 10:08:14
 \ |
2023-05-03 10:08:19
 / - \

Looks like the -e flag is ignored. But it should work either way 🤔

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

I attached three logs:

local_console_output : how i setup my local task. Important commands: apt-install that installs the same dependencies that are on the docker_setup_bash_script ; and pip install -r requirements.txt
local_task_output: clearml experiment console log. The error "the following arguments are required: config" is the expected behavior
remote_task_output: clearml experiment console log obtained when i clone the local task and enqueue it for remote execution. Notice that the behavior is different: i get ImportError: cannot import name 'ops' from 'softgroup.ops' (/root/.clearml/venvs-builds/3.7/task_repository/SoftGroup.git/softgroup/ops/__init__.py)

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Hi PlainSeaurchin97 , can you share the full log and an example of how the requirements file looks?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Looks like it was a python thing, not a clearml thing!

Clearml correctly installs the . from requirements.txt , but the project from the working directory was conflicting with the installed package, so python couldn't find the compiled extension.

With some small changes to my repo, everything works

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

👍

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

in what order does the agent do things?
I assumed it was

Start the docker container
Run the docker setup bash script
Pull the repo , checkout the commit, apply changes
Install pip requirementsIn this case, i wouldn't have the correct version of the repo at the time the setup bash script runs

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Not sure if i can because of some proprietary stuff on the code.

But i'll try writing a minimum working example on monday!

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Basically: locally, when i run pip install -r requirements.txt , the softgroup.ops package is installed correctly. But not on the remote worker

I install the softgroup.ops package via the last line in requirements.txt , i.e. pip install -e .

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

I ned to pip-install the package because i need to build some Cuda extensions

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

Is there any way i can do something equivalent to -e . in the agent context?

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

I don't think -e . will work when running from the agent context

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

1K Views

14 Answers

one year ago