Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hey Guys, Another Question About Deploying My Own Trains Server. I Have A Trains-Server Deployed On My K8S Cluster Using The Trains Helm Chart (Which Is Awesome). Now I Want To Create A Deployment Running Trains-Agent As Specified In The [Trains-Helm Repo


Hey FriendlySquid61 and SuccessfulKoala55 . I followed your guidance and am back with the results.
First of all, i changed the Hosts urls to follow the format of the default agentservices values in the helm chart.
Now they look like this:
` agent:
numberOfTrainsAgents: 1
nvidiaGpusPerAgent: 0
defaultBaseDocker: "nvidia/cuda"
agentVersion: ""

made the hosts into k8s dns

trainsApiHost: " "
trainsWebHost: " "
trainsFilesHost: " "
trainsGitUser: null
trainsGitPassword: null
trainsAccessKey: null
trainsSecretKey: null
awsAccessKeyId: null
awsSecretAccessKey: null
awsDefaultRegion: null
azureStorageAccount: null
azureStorageKey: null Turns out that this does the same thing as the full k8s dns that I wrote, since the agents are in the same trains workspaces as the server. So basically i just used the long version before. I also reduced the number of agents in the deployment to 1 and run my manual dummy-agent so that i can control the trains-agent daemon ` call

With this config, the agents still see themselves as connected. When i run trains-agent list from my dummy agent this is what i get
` root@dummy-agent:/# trains-agent list
workers:

  • company:
    id: d1bd92a3b039400cbafc60a7a5b1e52b
    name: trains
    id: trains-agent-584dfcc6cd-fxvkb:gpuall
    ip: 172.31.15.68
    key: worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___trains-agent-584dfcc6cd-fxvkb:gpuall
    last_activity_time: '2020-11-08T12:22:25.157024'
    last_report_time: '2020-11-08T12:22:25.157024'
    queues:
    • id: e3f7b34cbc1f4a0199045d5504b85b18
      name: default
      num_tasks: 0
      register_time: '2020-11-08T12:07:49.649695'
      register_timeout: 600
      tags: []
      user:
      id: tests
      name: tests
  • company:
    id: d1bd92a3b039400cbafc60a7a5b1e52b
    name: trains
    id: dummy-agent:gpuall
    ip: 172.31.43.220
    key: worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___dummy-agent:gpuall
    last_activity_time: '2020-11-08T12:22:37.414504'
    last_report_time: '2020-11-08T12:22:37.414504'
    queues:
    • id: e3f7b34cbc1f4a0199045d5504b85b18
      name: default
      num_tasks: 0
      register_time: '2020-11-08T12:22:34.382837'
      register_timeout: 600
      tags: []
      user:
      id: tests
      name: tests
  • company:
    id: d1bd92a3b039400cbafc60a7a5b1e52b
    name: trains
    id: trains-services
    ip: 172.31.0.170
    key: worker_d1bd92a3b039400cbafc60a7a5b1e52b___tests___trains-services
    last_activity_time: '2020-11-08T12:22:42.412209'
    last_report_time: '2020-11-08T12:22:42.412209'
    queues:
    • id: a0c0ab0fa2f94186abf265cd376f4530
      name: services
      num_tasks: 0
      register_time: '2020-11-08T12:07:36.447078'
      register_timeout: 600
      tags: []
      user:
      id: tests
      name: tests I tried creating a new queue in the UI called oneone however, when i run the following command i get the following message: root@dummy-agent:/# TRAINS_DOCKER_SKIP_GPUS_FLAG=1 TRAINS_AGENT_K8S_HOST_MOUNT=/root/.trains:/root/.trains trains-agent daemon --dock "nvidia/cuda" --force-current-version --queue oneone

trains_agent: ERROR: Could not find queue with name/id "oneone" It doesnt recognize the queue named oneone. However, if i run the same command and write --queue default instead, it runs properly and another process running trains-agent list ` can see it connected (this is what i showed you above).

I also tried to enqueue a task to the default queue, since both the agent deployment and my dummy agent are showed in the agent cli to be listening to the default queue. However, the task i enqueued stays in the pending stage.

On a related note, i tried to look at the trains-server api to see how i can get the queue id instead of the name, but that page in your docs seems to be broken
https://allegro.ai/docs/references/trains_api_ref/trains_api_ref.html

Let me know what you think, and thanks again for all your help.

  
  
Posted 4 years ago
150 Views
0 Answers
4 years ago
one year ago
Tags