Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
DilapidatedParrot58
Moderator
42 Questions, 205 Answers
  Active since 10 January 2023
  Last activity 2 years ago

Reputation

0

Badges 1

186 ร— Eureka!
0 Hi

python3 slack_alerts.py --channel trains-alerts --slack_api "OUR_KEY" --include_completed_experiments --include_manual_experiments

4 years ago
0 Hi

all our workers went down after starting the slack bot, is it expected?)

4 years ago
0 Hey Guys, I Keep Getting

thank you ๐Ÿ˜ƒ

4 years ago
0 Hey Guys, I Keep Getting

do you have any idea why cleanup task keeps failing then (it used to work before the update)

4 years ago
0 Feature Request: Clearml Prints Github Token In The Log, When There Is "Repository Not Found" Error. It Would Be Nice If Could Hide It

in order to use private repositories for our experiments I add agent.git_user and agent.git_pass options to clearml.conf when launching agents

if someone accidentally tries to launch an experiment from non-existing repo, ClearML will print
fatal: repository ' https://username:token@github.com/our_organization/non_existing_repo.git/ ' not found

exposing the real token

3 years ago
0 Hi

new icons are slick, it would be even better if you could upload custom icons for the different projects

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

task = Task.get_task(task_id = args.task_id)
task.mark_started()
task.set_parameters_as_dict(
{
"General": {
"checkpoint_file": model.url,
"restart_optimizer": False,
}
}
)
task.set_initial_iteration(0)
task.mark_stopped()
Task.enqueue(task = task, queue_name = task.data.execution.queue)

3 years ago
0 Hey Guys, I Keep Getting

nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )
Exception: 'Tasks' object has no attribute 'id

weirdly enough, curl http://apiserver:8008 from inside the container works

4 years ago
0 Hey Guys, I Keep Getting

new version worked

4 years ago
0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

I decided to restart the containers one more time, this is what I got.

I had to restart Docker service to remove the containers

4 years ago
0 Hey Guys, I Keep Getting

nice, thanks! I'll check if it solves the issue first thing tomorrow in the morning

4 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

it also happens sometimes during the run when tensorboard is trying to write smth to the disk and there are multiple experiments running. so it must be smth similar to the scenario you're describing, but I have no idea how it can happen since I'm running four separate workers

4 years ago
0 Hey Guys, I Keep Getting

yeah, we did. let me check if explicitly setting credentials helps

4 years ago
0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer

3 years ago
0 Hey Guys, I Keep Getting

default docker-compose

4 years ago
Show more results compactanswers