Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
2 Answers
938 Views
0 Votes 2 Answers 938 Views
First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Hi, coming back with the venv caching: with the following setting: I call Task._update_requirements(["."]) setup.py has the following install_requires=["my-p...
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, is it possible to specify the required version of python for a Task that is different from the python running the clearml-agent? Example: my clearml-agen...
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...
3 years ago
0 Votes
4 Answers
960 Views
0 Votes 4 Answers 960 Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
4 years ago
0 Votes
2 Answers
925 Views
0 Votes 2 Answers 925 Views
Hey there πŸ™‚ Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...
3 years ago
0 Votes
30 Answers
987 Views
0 Votes 30 Answers 987 Views
Could you please explain a bit more how trains adapt the torch version depending on the installed cuda version? Here is my setup: cuda 102 installed and corr...
4 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey again 😁 Is it possible to run multiple agents on the same machine? And with some in services mode?
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
2 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...
4 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
Hi, I deleted some archived experiments in clearml server 1.0 and the popup in the dashboard showed “the following artifacts were not deleted”, with a list o...
3 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, I see that there is a new parameter in aws autoscaler: max_spin_up_time_min - What is the difference with max_idle_time_min ?
aws
3 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) πŸ€”
3 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I t...
4 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi there, is it possible to configure the clearml-agent to run some commands before running each experiment it launches? Eg. echo "test" > "test.txt" && <-- ...
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...
3 years ago
0 Votes
3 Answers
967 Views
0 Votes 3 Answers 967 Views
Hi, I have several long running experiments failing with Process failed, exit code -9 and no other error with clearml 1.0.4 and clearml-agent 1.0.0, what cou...
3 years ago
0 Votes
13 Answers
2K Views
0 Votes 13 Answers 2K Views
Hi, I am trying to use the clearml-agent in docker mode to run an experiment, but it seems to fail passing the clearml.conf file to the docker container: Exe...
one year ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi, I recently updated clearml-server to 1.7 and I am getting a lot of the following errors since today on any experiment (I didn't had this error before): 1...
2 years ago
0 Votes
10 Answers
888 Views
0 Votes 10 Answers 888 Views
Hi, just want to report a small bug in the clearml dashboard: after queuing an experiment, if I change the experiment queue, then go back to the experiment I...
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, where can I find the logs of trains-agent by default?
4 years ago
0 Votes
1 Answers
906 Views
0 Votes 1 Answers 906 Views
Hey there πŸ™‚ Would in the WebUI, on an experiment CONFIGURATION tab, for a specific parameter, would it be possible not show its value as a single string whe...
2 years ago
0 Votes
1 Answers
976 Views
0 Votes 1 Answers 976 Views
Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...
3 years ago
0 Votes
5 Answers
984 Views
0 Votes 5 Answers 984 Views
Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...
3 years ago
0 Votes
19 Answers
1K Views
0 Votes 19 Answers 1K Views
I guess one experiment is running backwards in time πŸ˜„
2 years ago
Show more results questions
0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

one year ago
0 Hi, Did Anyone Experiment With Running On The Aws Autoscaler On Spots And Knows Whether There Is Configuration For Retry Policy When Spot Get Evacuated Mid-Job?

Hi there, yes I was able to make it work with some glue code:
Save your model, optimizer, scheduler every epoch Have a separate thread that periodically pulls the instance metadata and check if the instance is marked for stop, in this case, add a custom tag eg. TO_RESUME Have a services that periodically pulls failed experiments from the queue with the tag TO_RESUME, force marking them as stopped instead of failed and reschedule them with as extra-param the last checkpoint

3 years ago
0 Hi There,

Yes that was my assumption as well, it could be several causes to be honest now that I see that also matplotlib itself is leaking πŸ˜„

one year ago
0 Hi, I Would Like To Bring Awareness

RuntimeError: CUDA error: no kernel image is available for execution on the device

one year ago
0 Hi There,

For me it is definitely reproducible πŸ˜„ But the codebase is quite large, I cannot share. The gist is the following:

import matplotlib.pyplot as plt
import numpy as np
from clearml import Task
from tqdm import tqdm

task = Task.init("Debug memory leak", "reproduce")

def plot_data():
    fig, ax = plt.subplots(1, 1)
    t = np.arange(0., 5., 0.2)
    ax.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
    return fig

for i in tqdm(range(1000), total=1000):
    fig = plot_data()
  ...
one year ago
0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!

2 years ago
0 Hi, I Would Like To Bring Awareness

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)

Thank you!

does not contain a specific wheel for cuda117 to x86, they use the pip defualt one

Yes so indeed they don't provide support for earlier cuda versions on latest torch versions. But I should still be able to install torch==1.11.0+cu115 even if I have cu117. Before that is what the clearml-agent was doing

one year ago
0 Hi There,

With a large enough number of iterations in the for loop, you should see the memory grow over time

one year ago
0 Hi There,

Adding back clearml logging with matplotlib.use('agg') , uses more ram but not that suspicious
image

one year ago
0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

3 years ago
0 Hi, I Would Like To Bring Awareness

When running my training code

one year ago
0 Hi, I Would Like To Bring Awareness

So the wheel that was working for me was this one: [torch-1.11.0+cu115-cp38-cp38-linux_x86_64.whl](https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp38-cp38-linux_x86_64.whl)
image

one year ago
0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get ignite’s events fired, but still no scalars logged 😞
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

3 years ago
0 Hi, If I Am Starting My Training With The Following Command:

I fixed, will push a fix in pytorch-ignite πŸ™‚

3 years ago
0 Hi, I Have Another Problem

AgitatedDove14 one last question: how can I enforce a specific wheel to be installed?

4 years ago
0 Hello, In The Following Context:

That said, you might have accessed the artifacts before any of them were registered

I called task.wait_for_status() to make sure the task is done

4 years ago
0 Hi, I Have Another Problem

agent.cuda_version = 0 agent.cudnn_version = 0

4 years ago
0 Hello, In The Following Context:

Thanks AgitatedDove14 !
Could we add this task.refresh() on the docs? Might be helpful for other users as well πŸ™‚ OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

4 years ago
0 Hi, I Have Another Problem

agent.cuda_version = 110 agent.cudnn_version = 0

4 years ago
0 Hello, In The Following Context:

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

That sounds awesome! It will definitely fix my problem πŸ™‚

In the meantime: I now do:
task.wait_for_status() task._artifacts_manager.flush() task.artifacts["output"].get()But I still get KeyError: 'output' ... Was that normal? Will it work if I replace the second line with task.refresh () ?

4 years ago
0 Hi, I Have Another Problem

I have 11.0 installed but on another machine with 11.0 installed as well, trains downloads torch for cuda 10.1, I guess this is because no wheel exists for torch==1.3.1 and cuda 11.0

4 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

Because it lives behind a VPN and github workers don’t have access to it

2 years ago
2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that

3 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

even if I move the Github workers internally where they could have access to the prod server, I am not sure I would like that, because it would pile up test data in the prod server that is not necessary

2 years ago
0 Hi, I Just Updated Clearml Server 1.0 Using

Hi SuccessfulKoala55 , How can I now if I log in in this free access mode? I assume it is since in the login page I only see login field, not password field

3 years ago
Show more results compactanswers