Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
ThickKitten19
Moderator
3 Questions, 9 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

9 × Eureka!
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
2 years ago
0 Votes
1 Answers
544 Views
0 Votes 1 Answers 544 Views
7 months ago
0 Votes
6 Answers
995 Views
0 Votes 6 Answers 995 Views
one year ago
0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

I see! Then the command clearml-agent execute --id <task_id here> should reload the reported scalars and the task needs to reload last checkpoints only, right?

That's good question too! We didn't figure out the best way of continuing for both the grid and optuna. Can you suggest something?

2 years ago
0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

AgitatedDove14 I am not restarting the agent itself, I just need to be able continue the experiment from the same progress point. It can be a different agent. In fact, I am just loading the progress to another agent within the available queue.

2 years ago
0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

AgitatedDove14 Let me clarify I think you have misunderstood me.

The main reason we need the above mentioned functionality is because there are some experiments that need to run for a long time. Let's say weeks.
However, the importance of the experiment is low so when other, more important experiments appear. We need to temporarily pause(kill or something else) running HPO task and reassign the resource for other needs.
Later, when more important experiments has been completed, we can conti...

2 years ago
0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

Quick question when you say the HPO Task, you mean the HPO controller logic Task (i.e. the one launching the training jobs), or do you mean the actual training job itself (i.e. running with a specific set of parameters decided by the HPO controlling task) ?

AgitatedDove14 Sorry, my bad! By HPO task I mean the actual training job itself.
We run the HPO controller logic Task on a separate cpu only machine, so we can think that this task is always on. Only the training jobs can go ...

2 years ago
0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

Hi AgitatedDove14 I get the reported scalars from the web using
model_task = Task.get_task(task_id=model_task_id) scalars = model_task.get_reported_scalars()then register each of the scalars with something like
logger.report_scalar(title=metric_key, series=series_val['name'], value=y, iteration=x)Then you have reported scalars to which I am able to append rest of the model training reports.
Workers are running across multiple machines and you can monitor if a task is dead by looking...

2 years ago
one year ago