Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi, Together with ElegantKangaroo44 we found two unexpected behaviors in task.models['output'] : The input model of the task is included in the list The best...
4 years ago
0 Votes
8 Answers
943 Views
0 Votes 8 Answers 943 Views
3 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I t...
4 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...
2 years ago
0 Votes
5 Answers
960 Views
0 Votes 5 Answers 960 Views
Hello, I have a small question regarding UI: Currently, in the artifacts section of a task, the FILE PATH displayed for artifacts stored in s3 are displayed ...
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Hi, some properties of the Task object are not listed in the documentation (such as task.parent, which is not clear whether it is the parent task object itse...
4 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi, from within an experiment, how can I intercept the signal that the experiment was aborted and execute a cleanup function? I tried to intercept SIGINT and...
2 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...
4 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...
4 years ago
0 Votes
13 Answers
1K Views
0 Votes 13 Answers 1K Views
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...
4 years ago
0 Votes
16 Answers
951 Views
0 Votes 16 Answers 951 Views
Hey, I have a problem with the following task: def main(args): config = yaml.load(open(args.config)) if __name__ == '__main__': parser = argparse.ArgumentPar...
4 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
2 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
2 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
The “Manage queue” option in the right tab on a queued experiment is broken in v1.0 (it does nothing)
3 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi, I would like to report something else weird in the clearml-agent 1.5.1 running in docker mode: In the logs, when it dumps its config, it writes: docker_c...
one year ago
0 Votes
4 Answers
958 Views
0 Votes 4 Answers 958 Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
4 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
3 years ago
0 Votes
25 Answers
990 Views
0 Votes 25 Answers 990 Views
Hi, I have another problem ๐Ÿ˜… in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
4 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hello, I am getting ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf but I did specify them in my...
4 years ago
0 Votes
5 Answers
941 Views
0 Votes 5 Answers 941 Views
Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...
2 years ago
0 Votes
2 Answers
928 Views
0 Votes 2 Answers 928 Views
3 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi, I recently updated clearml-server to 1.7 and I am getting a lot of the following errors since today on any experiment (I didn't had this error before): 1...
2 years ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...
3 years ago
0 Votes
13 Answers
988 Views
0 Votes 13 Answers 988 Views
Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...
4 years ago
Show more results questions
0 Hi There,

Ok no it only helps if as far as I don't log the figures. If I log the figures, I will still run into the same problem

one year ago
0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

in my clearml.conf, I only have:
sdk.aws.s3.region = eu-central-1 sdk.aws.s3.use_credentials_chain = true agent.package_manager.pip_version = "==20.2.3"

one year ago
0 Hi There,

Is it exactly agg or something different?

one year ago
0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

one year ago
0 Hi, Did Anyone Experiment With Running On The Aws Autoscaler On Spots And Knows Whether There Is Configuration For Retry Policy When Spot Get Evacuated Mid-Job?

Hi there, yes I was able to make it work with some glue code:
Save your model, optimizer, scheduler every epoch Have a separate thread that periodically pulls the instance metadata and check if the instance is marked for stop, in this case, add a custom tag eg. TO_RESUME Have a services that periodically pulls failed experiments from the queue with the tag TO_RESUME, force marking them as stopped instead of failed and reschedule them with as extra-param the last checkpoint

3 years ago
0 Hi There,

Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking ๐Ÿ˜„

one year ago
0 Hi, I Would Like To Bring Awareness

RuntimeError: CUDA error: no kernel image is available for execution on the device

one year ago
0 Hi There,

For me it is definitely reproducible ๐Ÿ˜„ But the codebase is quite large, I cannot share. The gist is the following:

import matplotlib.pyplot as plt
import numpy as np
from clearml import Task
from tqdm import tqdm

task = Task.init("Debug memory leak", "reproduce")

def plot_data():
    fig, ax = plt.subplots(1, 1)
    t = np.arange(0., 5., 0.2)
    ax.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
    return fig

for i in tqdm(range(1000), total=1000):
    fig = plot_data()
  ...
one year ago
0 Hi There,

Early debugging signals show that auto_connect_frameworks={'matplotlib': False, 'joblib': False} seem to have a positive impact - it is running now, I will confirm in a bit

one year ago
0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!

2 years ago
0 Hi There,

Update: I successfully isolated one of the reason, mem leak in matplotib itself, I opened an issue on their repo here

one year ago
0 Hi, I Would Like To Bring Awareness

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)

Thank you!

does not contain a specific wheel for cuda117 to x86, they use the pip defualt one

Yes so indeed they don't provide support for earlier cuda versions on latest torch versions. But I should still be able to install torch==1.11.0+cu115 even if I have cu117. Before that is what the clearml-agent was doing

one year ago
0 Hi There,

With a large enough number of iterations in the for loop, you should see the memory grow over time

one year ago
0 Hi There,

Adding back clearml logging with matplotlib.use('agg') , uses more ram but not that suspicious
image

one year ago
0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

3 years ago
0 Hi, I Would Like To Bring Awareness

When running my training code

one year ago
0 Hi, I Would Like To Bring Awareness

So the wheel that was working for me was this one: [torch-1.11.0+cu115-cp38-cp38-linux_x86_64.whl](https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp38-cp38-linux_x86_64.whl)
image

one year ago
0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get igniteโ€™s events fired, but still no scalars logged ๐Ÿ˜ž
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

3 years ago
0 Hi, If I Am Starting My Training With The Following Command:

I fixed, will push a fix in pytorch-ignite ๐Ÿ™‚

3 years ago
0 Hi, I Have Another Problem

AgitatedDove14 one last question: how can I enforce a specific wheel to be installed?

4 years ago
0 Hello, In The Following Context:

That said, you might have accessed the artifacts before any of them were registered

I called task.wait_for_status() to make sure the task is done

4 years ago
0 Hi, I Have Another Problem

agent.cuda_version = 0 agent.cudnn_version = 0

4 years ago
0 Hello, In The Following Context:

Thanks AgitatedDove14 !
Could we add this task.refresh() on the docs? Might be helpful for other users as well ๐Ÿ™‚ OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

4 years ago
0 Hi, I Have Another Problem

agent.cuda_version = 110 agent.cudnn_version = 0

4 years ago
0 Hello, In The Following Context:

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

That sounds awesome! It will definitely fix my problem ๐Ÿ™‚

In the meantime: I now do:
task.wait_for_status() task._artifacts_manager.flush() task.artifacts["output"].get()But I still get KeyError: 'output' ... Was that normal? Will it work if I replace the second line with task.refresh () ?

4 years ago
0 Hi, I Have Another Problem

I have 11.0 installed but on another machine with 11.0 installed as well, trains downloads torch for cuda 10.1, I guess this is because no wheel exists for torch==1.3.1 and cuda 11.0

4 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

Because it lives behind a VPN and github workers donโ€™t have access to it

2 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

Iโ€™d like to move to a setup where I donโ€™t need these tricks

2 years ago
Show more results compactanswers