Hi Everyone,

Answered

Hi Everyone,

I am trying to make the Hyperparameter tuning part work. When I change the hyper parameters from the allegro web app, I enqueue it and run the train agent. The new cloned script goes in running state and then I get an error “git diff failed”

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Votes Newest

Answers 30

Yes. You can see the agent's configuration in the experiment's log - all values are printed there

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I just checked there are some uncommitted changes that I see in execution

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

the agent uses the same configuration file

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I need to showcase this to my senior tomorrow

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Wherever running an experiment, it will install all required packages in a virtual environment to make sure the experiment is executed exactly as expected

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I did that with the agent without making changes it works

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Yes 😂

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Good 🙂 I see you still have issues with your CUDA installation

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

And how do I change agent configuration file?

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

But then how the normal one I.e without cloning worked well

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Anyways from what I see in the logs it shows agent.default_python = 3.7, cuda = 100, cudnn=75

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Yes very much

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Is there a difference in the uncommitted changes section before your changes and after?

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It fails only when I make any changes

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Are you running the agent on the same machine?

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

However all packages are cached so it won't download again

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I can see you failed experiments in the demo server, but I can't see any completed experiment from which they were cloned...

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

and are these the correct values?

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

So now what do I do?

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

You can first try to run your experiment again (not by cloning and running in the agent but by executing it again locally). If you like, you can copy the example and run it from another folder which is not located inside a git repo

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

https://www.allegro.ai/docs/deploying_trains/trains_agent_install_configure/

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

However will it always install all the packages again and again? Is there any workaround for that?

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Did you make sure the agent's default python version and cuda / cudnn are configured correctly?

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It worked finally 🙂

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

I know, they represent the changes you made to the example script

  				
Posted 
	5 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

It would be great if these issues are well elaborated in the documentation. Though the documentation is pretty good.

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

I only see trains conf file

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

I ran it outside the examples folder and it works

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

In the conf file?

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

  				
Posted 
	5 years ago

					More  		
  Report
		
					TastyOwl44
				
					0
					 × 1

Write your answer

2K Views

30 Answers

5 years ago

2 years ago