Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

agent.package_manager.type = pip ... Using base prefix '/home/machine1/miniconda3/envs/py36' New python executable in /home/machine1/.trains/venvs-builds/3.6/bin/python3.6 Also creating executable in /home/machine1/.trains/venvs-builds/3.6/bin/python Installing setuptools, pip, wheel...

5 years ago
0 Hi, Together With

The experiment finished completely this time again

5 years ago
0 Hey, I Hope This Is The Right Place To Ask. We'Re A Small Data Science Team That Wants To Log Everything About Our Ml Models. Looking Around On The Internet, Mostly Mlflow Is Being Recommended, But Occasionally The Name Trains Pop-Ups. According To You,

I would let the trains team answer this in details, but as a user moving from MLflow to trains, I can share the following insights:

MLflow and trains overlap when it comes to having a system with nice web UI to compare/log experiments/models/metrics. But MFlow lacks a crutial feature IMO which is ML/DevOps: Using MLFlow, you will have to take care of the whole maintenance of your machines, design interactions between them, etc. This is where trains shines, it provides these features out-of-t...

5 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

You mean you "aborted the task" from the UI?

Yes exactly

I'm assuming from the leftover processes ?

Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

yes in venv mode, I'll try with the latest version as well

2 years ago
0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

can it be that the merge op takes so much filesystem cache that the rest of the system becomes unresponsive?

4 years ago
0 Hi There,

clearml doesn't change the matplotlib backend under the hood, right? Just making sure πŸ˜„

2 years ago
0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

yes, here is the error (the space at the end of the line is there)
` Applying uncommitted changes
Executing: ('git', 'apply'): b'error: corrupt patch at line 13\n'
Failed applying diff
trains_agent: ERROR: Failed applying git diff:
diff --git a/configs/2.2.2_from_scratch.yaml b/configs/2.2.2_from_scratch.yaml
index 9fece48..5816f78 100644
--- a/configs/2.2.2_from_scratch.yaml
+++ b/configs/2.2.2_from_scratch.yaml
@@ -136,7 +136,7 @@ data_processing:
optimizer:
type: 'RMSprop'
args:

  • lr: 2.5e...
4 years ago
0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

No, I want to launch the second step after the first one is finished and all its artifacts are uploaded

5 years ago
0 I Am Wondering Is It Possible To Schedule A Task To Run At Certain Time In Periodic Fashion Aka. Cron Style... Thinking Of Having A Monitoring Task To Be Run Routinely ... I Could Use A Cron On One Of The Server But Prefer To Run It On Trains As Then I Am

I don't think there is an example for this use case in the repo currently, but the code should be fairly simple (below is a rough draft of what it could look like)
` controller_task = Task.init(...)
controller_task.execute_remotely(queue_name="services", clone=False, exit_process=True)

while True:
periodic_task = Task.clone(template_task_id)
# Change parameters of {periodic_task} if necessary
Task.enqueue(periodic_task, queue="default")
time.sleep(TRIGGER_TASK_INTERVAL_SECS) `

5 years ago
0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Also maybe we are not on the same page - by clean up, I mean kill a detached subprocess on the machine executing the agent

4 years ago
0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

btw I monkey patched ignite’s function global_step_from_engine to print the iteration and passed the modified function to the ClearMLLogger.attach_output_handler(…, global_step_transform=patched_global_step_from_engine(engine)) . It prints the correct iteration number when calling ClearMLLogger.OutputHandler.__ call__ .
` def call(self, engine: Engine, logger: ClearMLLogger, event_name: Union[str, Events]) -> None:

    if not isinstance(logger, ClearMLLogger):
  ...
4 years ago
0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Interesting idea! (I assume for reporting only, not configuration)

Yes for reporting only - Also to understand which version is used by the agent to define the torch wheel downloaded

regrading the cuda check with

nvcc

, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...

Ok, but when nvcc is not ava...

4 years ago
0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Here is (left) the data disk (/opt/clearml) and right the OS disk

4 years ago
Show more results compactanswers