Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
49 Questions, 8054 Answers
  Active since 10 January 2023
  Last activity 9 months ago

Reputation

0

Badges 1

25 × Eureka!
one year ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

There seems to be a problem with multiprocessing: Although I stopped the task,

You mean you "aborted the task" from the UI?

  • There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?

Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

one year ago
0 Any Specific Reason For Modelling Experiments As Separate Tasks Rather Than A Single Entity With Multiple Runs?

Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects ๐Ÿ™‚ it is probably the easiest now that we have support for nested projects.

3 years ago
0 Has Anyone Successfully Deployed Clearml On A Kube Cluster Utilizing Istio? I Don’T See Any Mention Of Istio In The Docs.

iโ€™m working on creating a custom config with istio

That is awesome! let me know if we could help ๐Ÿ™‚
Also please consider PRing it, I'm sure other users will appreciate the option

3 years ago
0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:

2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads

And there's no sign of updates on the dashboard

Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.
task.flush() sleep(45)

3 years ago
0 Hi All, I Am Testing The New

named asย 

venv_update

ย (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?

This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version

Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anย 

output_uri

ย parameter in theย 

PipelineDecorator.componen...

3 years ago
0 Quick Question, Can Trains Log Keras Loss Values And/Or Metrics Automatically? Or Would I Have To Attach A Tensorboard Callback?

ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)

4 years ago
0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

(torchvision vs. cuda compatibility, will work on that),

The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)

3 years ago
0 Fatal: Could Not Read From Remote Repository. Please Make Sure You Have The Correct Access Rights And The Repository Exists.

I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui

You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19

3 years ago
0 Cloning: Origin Repository Cloning Failed: 'Nonetype' Object Has No Attribute 'Startswith' Trains_Agent: Error: Failed Cloning Repository. 1) Make Sure You Pushed The Requested Commit: (Repository='Origin', Branch='Master', Commit_Id='051A8418Cf1D85F392

MysteriousBee56 there is no way to tell the trains-agent to pull from local copy of your repository...
You might be able to hack it, if you copy the entire local repo to the trains-agent version control cache. would that help you?

4 years ago
0 Hi, I Have Another Problem

That depends on what you have installed ๐Ÿ™‚

4 years ago
0 Crazy Idea:

I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)

one year ago
0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

instead of terminating them once they are inactive, so that they could be available immediately when they are needed.

JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?

2 years ago
0 Hi, I Have Another Problem

(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)

4 years ago
0 Hi, I'Ve Recently Upgraded To 0.15.1 From 0.14.2, And For Some Reason A Code That Previously Worked In Which I'M Getting The Tags Of A Model Using

PompousBeetle71 notice that starting with this version when you set model tags they will be stored as user tags , which you can change and edit in UI. So if you still need the system tags you have to access them directly.

4 years ago
0 Running Into A Strange Issue—

Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)

3 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

None of them is problematic, this is what I'm trying to say ๐Ÿ™‚
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:
task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)

4 years ago
0 What Could Be The Reason For My Package To Not Be Loading Under The "Installed Packages"? I Have A

So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:
pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?

3 years ago
Show more results compactanswers