Hey, I Was Wondering How Can I Do Hparams Tuning With Trains? Couldn'T Find Anything On The Documentation

What should have happened is the experiments should have been pending (i.e. in a queue)
(Not sure why they are not).
You can manually send them for execution , right click on an experiment in the able, select enqueue and select the default queue (This will be the one the trains-agent will pull from , by default)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I manually sent one to queue, then it started running but failed and appearantly, the trains can't access my git repository
I tried docker-compose -f down, doing
export TRAINS_AGENT_GIT_USER=(my_user) export TRAINS_AGENT_GIT_PASS=(my_pass)and then docker-compose -f up but I get the same error

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

ShaggyHare67 could you send the console log trains-agent outputs when you run it?

Now the

trains-agent

is running my code but it is unable to import

trains

Do you have the package "trains" listed under "installed packages" in your experiment?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Quick update: Apparently the base (template) code we run (with the main model) which were 2 weeks ago ~ 1 month ago, it did show installed packages but now it doesn't. Nothing changed in trains settings / trains-server settings so I wonder what could cause that?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

AgitatedDove14 Sadly I can not send the console log because it is on a different computer (and on a different, closed network). But in the log it is able to clone the repository, executing the right py file and then crashes on the line with I import trains.

The experiment has no packages identified under installed packages. So obviously that is the problem but as I've stated in my previous comment I am trying to link it to run on /home/user/miniconda/envs36/bin/python , or am I missing something?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

So obviously that is the problem

Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent is creating a new clean venv for every experiment, if you need you can set in your trains.conf :
agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L55
This will cause the newly created venv to inherit the packages from the system, meaning it should have the trains package if it is already installed.
What do you think?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ShaggyHare67

Now the

trains-agent

is running my code but it is unable to import

trains

...

What you are saying is you spin the 'trains-agent' inside a docker? but in venv mode ?

On the server I have both python (2.7) and python3,

Hmm make sure that you run the agent with python3 trains-agent this way it will use the python3 for the experiments

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ShaggyHare67 are you saying the problem is trains fails discovering the packages in the manual execution ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I see in the UI are 5 drafts

What's the status of these 5 experiments? draft ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It's suppose to be running, how can I double check?
(I'm using my own Trains server)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

Yes

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

Which panel?
https://demoapp.trains.allegro.ai/workers-and-queues/workers

My screen is the same as this one (except in the available workers I only have trains-services)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

ShaggyHare67 notice that the services queue is designed to run CPU based tasks like monitoring etc.
For the actual training you need to run your trains-agent on a GPU machine.
Did you run the trains-agent init ? it will walk you through the configuration (git user/pass) included.
If you want to manually add them, you can see an example of the configuration file in the link below.
You can find it on ~\trains.conf
https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi ShaggyHare67 ,
Yes the trains.conf created by trains-agent is basically an extension of the trains usage (specifically it adds a section for the agent)
I'm assuming you are running the agent on the same development machine.
I guess the easiest is to rename the trains.conf to trains.conf.old and run trains-agent init
(No need to worry, the trains package supports it , so the new configuration file that will be generated will work just fine)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Yes that's exactly what I have when I create the UniformParameterRange() but it's still not found as a hyper parameter
I am using the learning rate and the other parameters in the model when I train by calling keras.optimizers Adam(...) with all the Adam configs
Wish I could've sent you the code but it's on another network not exposed to the public..

I'm completely lost

Edit:
It's looking like this:

opt = Adam(**configs['training_configuration']['optimizer_params']['Adam'])
model.compile(optimizer=opt, ........more params......)

Configs:
....more params....
training_configuration:
optimizer_params:
Adam:
learning_rate: 0.1
decay: 0
.....more params....

and at the beginning of the code I do task.connect(configs['training_configuration'], name="Train") which I do see the right params under Train in the UI
later on the hparams script I do: UniformParameterRange('Train/optimizer_params/Adam/learning_rate', ....the rest of the min max step params.....)
(with the rest of the code like in the example)

The thing is, on each of the drafts in the UI, I do see it's updating the right parameter under Train/optimizer_params/Adam/learning_rate with the step and everything. But at the script it says it can't find the hyper parameter and also it's finishing real quick so I know it's not really doing anything

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

(BTW: draft means they are in edit mode, i.e. before execution, then they should be queued (i.e. pending) then running then completed)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Things to check:
Task.connect called before the dictionary is actually used Just in case, do configs['training_configuration']=Task.connect(configs['training_configuration']) add print(configs['training_configuration']) after the Task.connect call, making sure the parameters were passed correctly

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the parameters are not being changed.
Correct?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 I write the docker-compose up for trains-server inside my server.
On my server I run my own docker (that contains all my code and my packages) and also there I do the trains-agent daemon --gpus all command.

How can I make the trains-agent run the python that I normally run? (located in /home/user/miniconda/envs/36/bin/python )
I tried editing the trains-agent conf and changed python_binary=/home/user/miniconda/envs36/bin/python but it didn't solve it.
I also tried editing package_manager.system_site_packages=true which didn't work

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

AgitatedDove14 So I managed to get trains-agent to access the git but now I'm facing another issue:

The trains-server is running on a remote server (which I ssh to), on that server I have my own docker which is where I write the code, and also on this docker I do trains-agent commands
Now the trains-agent is running my code but it is unable to import trains inside the code (and potentially more packages).

Any idea?
On the server I have both python (2.7) and python3, maybe it is automatically running python command (and not python3 ) so it doesn't have the package?
Also, is the code ran by trains will be executed from the server (where trains-server ) is running or from inside my docker (where trains-agent is running)?

Note that I can't have access to root on the server (only in my docker), so changing stuff like re-installing, etc, is not possible

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

A more detailed instructions:
https://github.com/allegroai/trains-agent#installing-the-trains-agent

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I also tried editing

package_manager.system_site_packages=true

which didn't work

Sadly that didn't do the trick, I wonder how come I don't have the installed packages?
I guarantee no one has deleted them, and it's a bit weird since I can run my code normally, it's just that trains doesn't discover them for some reason.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

AgitatedDove14 When I did trains-agent init it says there's already an init file, and when I open it it begins with # TRAINS SDK configuration file and it looks a little bit different than the config file you sent, how should I play this?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

Now it's running and I do see the gpu:gpuall in the available workers, running the script still produces the "Could not find request hyper-parameters..."
And also the optimizers are still on draft (except the main one which was created for them)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

Yes, this seems like the problem, you do not have an agent (trains-agent) connected to your server.
The agent is responsible for pulling the experiments and executing them.
pip install trains-agent trains-agent init trains-agent daemon --gpus all

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Go to the workers & queues, page right side panel 3rd icon from the top

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

you should see your agent there

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ShaggyHare67 in the HPO the learning should be (based on the above):
General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

did you run trains-agent ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Regarding step #5 I'm not sure how to check it, what I see in the UI are 5 drafts (concurrent_tasks is set to 5) and the "main" task init incharge of them, and there are clones of the original base experiment with different configurations (although they're not really a clone, only the configs are cloned. the artifacts output model and the results aren't cloned)

And for the things to check - Yup it's like that and still the same error

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ShaggyHare67
				
					0
					 × 1

Answers 30