Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

Hi, I am new here, can I ask question on trains-server also?

  
  
Posted 4 years ago
Votes Newest

Answers 17


For now we are using AWS batch for running those experiments.
because like this we don`t have to hold machines who waits for the jobs

  
  
Posted 4 years ago

CooperativeFox72 of course, anything trains related, this is the place 🙂
Fire away

  
  
Posted 4 years ago

I didn't try trains-agent yet, does it support using AWS batch?

  
  
Posted 4 years ago

I an running trains-server on AWS with your AMI (instance type t3.large)

The server runs very good, and works amazing!
Until we start to run more training in parallel (around 20).
Then, the UI start to be very slow and getting timeouts often.
Does upgrading the instance type can help here? or there is some limit to parallel running?

  
  
Posted 4 years ago

Hi CooperativeFox72 ,
From the backend guys, long story short, upgrade your machine => more cpu cores , more processes , it is that easy 🙂

  
  
Posted 4 years ago

CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?

  
  
Posted 4 years ago

I will go over the examples

  
  
Posted 4 years ago

AgitatedDove14 Maybe I need to change something here: apiserver.conf
for increasing workers number?

  
  
Posted 4 years ago

It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)

  
  
Posted 4 years ago

Let me check... I think you might need to docker exec
Anyhow, I would start by upgrading the server itself.
Sounds good?

  
  
Posted 4 years ago

Thanks I will upgrade the server for now and will let you know

  
  
Posted 4 years ago

The cool thing of using the trains-agent, you can change any experiment parameters and automate the process, so you get hyper-parameter optimization out of the box, and you can build complicated pipelines
https://github.com/allegroai/trains/tree/master/examples/optimization/hyper-parameter-optimization
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py

  
  
Posted 4 years ago

OHH nice, I thought that it just some kind of job queue on up and running machines

  
  
Posted 4 years ago

OHH nice, I thought that it just some kind of job queue on up and running machines

It's much more than that, it's a way of life 🙂
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)

Maybe I need to change something here: 

apiserver.conf

Not sure, I'm still waiting on answer... It might not be exposed to the configuration file. Give me an hour or two

  
  
Posted 4 years ago

Thanks I will upgrade my instance type and the add more workers. where I need to configure it?

  
  
Posted 4 years ago

Thanks!! you are the best..
I will give it a try when the runs will finish

  
  
Posted 4 years ago

CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.

  
  
Posted 4 years ago
983 Views
17 Answers
4 years ago
one year ago
Tags
Similar posts