Answered

Hi! In My Project I Need To Run A Lot Of Experiments On Different Subsets Of My Trainset, Collect Score And Perform Some Calculations Based On It. I Have

Hi!
In my project I need to run a lot of experiments on different subsets of my trainset, collect score and perform some calculations based on it. I have main.py , from which i call lots of train_on_subset(subset) . I want to 1) gather statistics of every call of train_on_subset and 2) use trains agent to queue those calls.

I came across number of difficulties trying to connect my code to trains framework. First, I perform many experiments in one process, so I can't create new task in every call of train_on_subset using Task.init , but I still need to track the progress of each experiment separately. Second, I cannot move train_on_subset to separate .py file and run it as console script, because I need to push a lot of parameters into it including model, and also need to get my score back to process it later in main.py

Please, let me know what's the best practice / architecture to connect your framework to my project. Any advice is highly appreciated. Thanks in advance!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UpsetCrocodile10
				
					0
					 × 1

Votes Newest

Answers 5

Hi AgitatedDove14
This is exactly what i needed, thank you a lot!

One problem I have with this function is that it creates drafts, but i need it to execute them and return scalars. Is this possible?

thanks again

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UpsetCrocodile10
				
					0
					 × 1

UpsetCrocodile10

Does this method expect

my_train_func

to be in the same file as

As long as you import it and you can pass it, it should work.

Child exp get's aborted immediately ...

It seems it cannot find the file "main.py" , it assumes all code is part of a single repository, is that the case ? What do you have under the "Execution" tab for the experiment ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi UpsetCrocodile10

First, I perform many experiments in one process, ...

How about this one:
https://github.com/allegroai/trains/issues/230#issuecomment-723503146
Basically you could utilize create_function_task
This means you have Task.init() on the mainn "controller" and each "train_in_subset" as a "function_task". Them the controller can wait on them, and collect the data (like the HPO does.

Basically:
` controller_task = Task.init(...)
children = []
for i, s in enumerate(my_subset):
child = task.create_function_task(my_train_func, arguments=s, func_name='subset_{}'.format(i))
children.append(child)

for child in children:
child.reload()
print(child.get_last_scalar_metrics())
sleep(5.0) `What do you think?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi UpsetCrocodile10

execute them and return scalars.

This should be a good start (I hope 🙂 )
` for child in children:

put the Task into an execution queue

Task.enqueue(child, queue_name='my_queue_here')

wait for the task to finish

child.wait_for_status(status=['completed'])

reload all the metrics

child.reload()

get the metrics

print(child.get_last_scalar_metrics()) `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14
Thanks, this works great!
Does this method expect my_train_func to be in the same file as Task.init() ? Child exp gets aborted immediately after starting with some strange exception in my case

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UpsetCrocodile10
				
					0
					 × 1

Write your answer

749 Views

5 Answers

3 years ago

one year ago