Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
3 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
Hi there, maybe this was already asked but I don't remember: Would it be possible to have the clearml-agent switch between docker mode and virtualenv mode at...
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
2 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
Hi, I encounter a weird behavior: I have a task A that schedules a task B. Task B is executed on an agent, but with an old commit 🤔 although the branch is p...
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, I am getting an error while running task.mark_stopped() , any idea why? (clearml 1.0.2, clearml-agent 1.0.0, python 3.6) File "/home/machine/.clearml/ven...
4 years ago
Show more results questions
0 Hi, I Have Another Problem

OK but nowhere I specified that, I just checked my trains.conf file

5 years ago
0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Unfortunately this is difficult to reproduce... Neverthless it would be important to me to be robust against it, because if this error happens in a task in the middle of my pipeline, the whole process fails.

This binds to another wider topic I think: How to "skip" tasks if they already run (a mechanism similar to what [ https://luigi.readthedocs.io/en/stable/ ] offers). That would allow to restart the pipeline and skip tasks until the point where the task failed

5 years ago
0 Hi, I Have Another Problem

I just started one and it wrote:
...

5 years ago
0 Hi, I Have Another Problem

I specified a torch @ https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl and it didn't detect the link, it tried to install latest version: 1.6.0

5 years ago
0 Hi, I Have Another Problem

I don't know why it didn't detect it in first place

5 years ago
0 Hi, I Have Another Problem

btw shoulnd't it be CUDA_VERSION=11.0 ?

5 years ago
0 Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

I also discovered https://h2oai.github.io/wave/ last week, would be awesome to be able to deploy it in the same manner

4 years ago
0 Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Also, from https://lambdalabs.com/blog/install-tensorflow-and-pytorch-on-rtx-30-series/ :

As of 11/6/2020, you can't pip/conda install a TensorFlow or PyTorch version that runs on NVIDIA's RTX 30 series GPUs (Ampere). These GPUs require CUDA 11.1, and the current TensorFlow/PyTorch releases aren't built against CUDA 11.1. Right now, getting these libraries to work with 30XX GPUs requires manual compilation or NVIDIA docker containers.

But what wheel is downloading trains in that case?

4 years ago
0 Hi, I Have Another Problem

ho, that might be it then, thanks!

5 years ago
0 Hi, I Have Another Problem

thanks, I will do that

5 years ago
0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Thanks! With this I’ll probably be able to reduce the cluster size to be on the safe side for a couple of months at least :)

4 years ago
0 Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Restarting the server ( docker-compose down then docker-compose up ) solved the problem 😌 All experiments are back

4 years ago
0 Hi, How Does

There was no possible cache, the agent was running on a new ec2 instance

2 years ago
0 Hi There,

Ok interestingly using matplotlib.use('agg') it doesn't leak (idea from here )
image

2 years ago
0 Hi, I Would Like To Follow-Up In This

I checked the server code diffs between 1.1.0 (when it was working) and 1.2.0 (when the bug appeared) and I saw many relevant changes that can introduce this bug > https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0

3 years ago
0 Hi There,

Ok so what is the value that is set when it is run by the agent? agg ?

2 years ago
0 Hi There,

Ok no it only helps if as far as I don't log the figures. If I log the figures, I will still run into the same problem

2 years ago
0 Hi There,

With a large enough number of iterations in the for loop, you should see the memory grow over time

2 years ago
0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

I think my problem is that I am launching an experiment with python3.9 and I expect it to run in the agent with python3.8. The inconsistency is from my side, I should fix it and create the task with python3.8 with:
task.data.script.binary = "python3.8" task._update_script(convert_task.data.script)Or use python:3.9 when starting the agent

2 years ago
0 Hi There,

Early debugging signals show that auto_connect_frameworks={'matplotlib': False, 'joblib': False} seem to have a positive impact - it is running now, I will confirm in a bit

2 years ago
0 Hi There,

Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking 😄

2 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Any chance this is reproducible ?

Unfortunately not at the moment, I could find a reproducible scenario. If I clone a task that was stuck and start it, it might not be stuck

How many processes do you see running (i.e. ps -Af | grep python) ?

I will check that when the next one will be blocked 👍

What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

I train with p...

3 years ago
4 years ago
Show more results compactanswers