Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Yes that’s what I did initially, but eventually I decided that it’s too much complexity added for nothing really, I’d rather drop omegaconf and if one day clearml supports it out of the box take advantage of it

2 years ago
0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?

3 years ago
0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , thanks for the answer! I will try adding 'multiprocessing_context='forkserver' to the DataLoader. In the issue you linked, nirraviv mentionned that forkserver was slower and shared a link to another issue https://github.com/pytorch/pytorch/issues/15849#issuecomment-573921048 where someone implemented a fast variant of the DataLoader to overcome the speed problem.
Did you experiment any drop of performances using forkserver? If yes, did you test the variant suggested i...

3 years ago
0 Hi There,

Ok interestingly using matplotlib.use('agg') it doesn't leak (idea from here )
image

one year ago
0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

here is the function used to create the task:

` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...

4 years ago
0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

In execution tab, I see old commit, in logs, I see an empty branch and the old commit

4 years ago
0 Hey Again

Hi SuccessfulKoala55 , Can the new accounts (password-protected) have the same names?

4 years ago
0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

The jump in the loss when resuming at iteration 31 is probably another issue -> for now I can conclude that:
I need to set sdk.development.report_use_subprocess = false I need to call task.set_initial_iteration(0)

3 years ago
0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

well I still see some ES errors in the logs
` clearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not...

3 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Any chance this is reproducible ?

Unfortunately not at the moment, I could find a reproducible scenario. If I clone a task that was stuck and start it, it might not be stuck

How many processes do you see running (i.e. ps -Af | grep python) ?

I will check that when the next one will be blocked 👍

What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

I train with p...

2 years ago
0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that

3 years ago
0 Got Some Errors While Running Migration Script From Es5 To Es7:

I should also rename /opt/trains/data/elastic_migrated_2020-08-11_15-27-05 folder to /opt/trains/data/elastic before running the migration tool right?

4 years ago
0 Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

The part where I'm lost is why would you need the path to the temp venv the agent creates/uses ?

let's say my task is calling a bash script, and that bash script is calling another python program, I want that last python program to be executed with the environment that was created by the agent for this specific task

2 years ago
2 years ago
0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

btw I monkey patched ignite’s function global_step_from_engine to print the iteration and passed the modified function to the ClearMLLogger.attach_output_handler(…, global_step_transform=patched_global_step_from_engine(engine)) . It prints the correct iteration number when calling ClearMLLogger.OutputHandler.__ call__ .
` def call(self, engine: Engine, logger: ClearMLLogger, event_name: Union[str, Events]) -> None:

    if not isinstance(logger, ClearMLLogger):
  ...
3 years ago
Show more results compactanswers