Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

186 × Eureka!
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
3 years ago
0 Votes
14 Answers
1K Views
0 Votes 14 Answers 1K Views
hey guys the first time I'm seeing this behavior I'm adding a new user to /opt/trains/config/apiserver.conf and restarting the containers. all old users are ...
4 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
3 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
2 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
hey guys, here I am again with another question 😃 after the latest update, I’m getting this error when I’m trying to compare scalars for more than 10 experi...
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
hey guys, a question about monthly worker_stats indices each of them takes up about 1gb for us. do we really need to keep all of them? is there any way to fr...
4 years ago
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
hey guys, I keep getting trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?...
3 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
downloading output artifacts from S3 by clicking on the download button next to Model URL was great, but since we moved from AWS to Yandex.Cloud, this featur...
2 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
yo guys, I'm getting Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(, 'Connection to O...
4 years ago
Show more results questions
0 Is Is Possible To Pass Custom

works like a charm!

2 years ago
4 years ago
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

our GPUs are 48GB, so it's quite wasteful to only run one job per GPU
yeah, I'm aware of that, I would have to make sure they don't fail to infamous CUDA out of memory, but still

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago
3 years ago
0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

thanks! I need to read all parts of documentation really carefully =) for some reason, couldn't find this section

4 years ago
0 Here I Am Again... Can'T Find How To Create A Custom Queue

LOL
wow 😃
I was trying to find how to create a queue using CLI 😃

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration

but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?

3 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I've already pulled new images from trains-server, let's see if the initial issue occurs again. thank for the fast response guys!

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

not necessarily, there are rare cases when container keeps running after experiment is stopped or aborted

will do!

3 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

great, this helped, thanks! I simply added https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html to trains.conf, and it seems to be working

I now have another problem, my code is looking for some additional files in the root folder of the project. I tried adding a Docker layer:
ADD file.pkl /root/.trains/venvs-builds/3.6/task_repository/project.git/extra_data/

but trains probably rewrites the folder when cloning the repo. is there any workaround?

4 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

I added the link just in case anyway 😃

also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)

4 years ago
0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

3 years ago
0 Hey Guys, A Question About Monthly Worker_Stats Indices Each Of Them Takes Up About 1Gb For Us. Do We Really Need To Keep All Of Them? Is There Any Way To Free Up The Space?

yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices

got it, thanks, will try to delete older ones

4 years ago
Show more results compactanswers