Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
28 Answers
2K Views
0 Votes 28 Answers 2K Views
Hi, I am trying to use omegaconf with task.connect_configuration and I get the following error: >>> OmegaConf.create(task.connect_configuration(config_dict))...
3 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, I deleted all archived experiments in a project and I just realized all experiments of all projects were deleted (clearml server v1.0.0) πŸ€”
4 years ago
0 Votes
13 Answers
2K Views
0 Votes 13 Answers 2K Views
Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I...
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...
5 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, is there a way to control after how much time an agent that went down is removed from the web-ui? I find the current value too high for my needs
2 years ago
0 Votes
27 Answers
2K Views
0 Votes 27 Answers 2K Views
Hi there, I found a memory leak in Logger.report_matplotlib_figure . I was constantly running out of memory when training my models so I decided to spend som...
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
⚠️ Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😡 ,...
2 years ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...
4 years ago
0 Votes
9 Answers
2K Views
0 Votes 9 Answers 2K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...
4 years ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, I just updated clearml-server to 1.1.0 and got the following error when starting it with docker-compose: clearml-apiserver | [2021-08-02 13:37:09,852] [8...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Are the env variables passed to trains-agent available in experiments run by this trains-agent?
5 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...
2 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...
5 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)
5 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi, on clearml-server 1.5.0, in scalar graphs, the new default value is “Show closest data on hover”. Would it be possible to make it automatically set to “C...
3 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...
aws
4 years ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...
4 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hey again 😁 Is it possible to run multiple agents on the same machine? And with some in services mode?
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi quick question: does Task.connect_configuration support OmegaConf DictConfig objects? ie. Can I do: config = train_task.connect_configuration(OmegaConf.lo...
3 years ago
0 Votes
13 Answers
3K Views
0 Votes 13 Answers 3K Views
Hi, I am trying to use the clearml-agent in docker mode to run an experiment, but it seems to fail passing the clearml.conf file to the docker container: Exe...
2 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...
4 years ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...
5 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)
4 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...
4 years ago
0 Votes
15 Answers
2K Views
0 Votes 15 Answers 2K Views
Hi, how can I get the logs from the pytorch ignite early stopping handler to be logged in clearml?
4 years ago
Show more results questions
0 Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Yes, I stayed with an older version for a compatibility reason I cannot remember now πŸ˜„ - just tested with 1.1.2 and it’s the same
I tried specifying the bucket directly in my clearml.conf, same problem. I guess clearml just reads from the env vars first

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Yes, thanks! In my case, I was actually using TrainsSaver from pytorch-ignite with a local path, then I understood looking at the code that under the hood it actually changed the output_uri of the current task, thats why my previous_task.output_uri = " s3://my_bucket " had no effect (it was placed BEFORE the training)

5 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Oops, I spoke to fast, the json is actually not saved in s3

5 years ago
0 Hi, In The Context Of Multi-Gpu Training, Is

if I want to resume a training on multi gpu, I will need to call this function on each process to send the weights to each gpu

3 years ago
4 years ago
0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

AgitatedDove14 So in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping class I see that some infos are logged (in the __call__ function), and I would like to have these infos logged by clearml

4 years ago
0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

Try to spin up the instance of that type manually in that region to see if it is available

4 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So most likely trains was masking the original error, it might be worth investigating to help other users in the future

5 years ago
0 Hey, I Have A Problem With The Following Task:

Thanks for the explanations,
Yes that was the case This is also what I would think, although I double checked yesterday:I create a task on my local machine with trains 0.16.2rc0 This task calls task.execute_remotely() The task is sent to an agent running with 0.16 The agent install trains 0.16.2rc0 The agent runs the task, clones it and enqueues the cloned task The cloned task fails because it has no hyper-parameters/args section (I can seen that in the UI) When I clone the task manually usin...

5 years ago
0 Hi, Together With

using trains RC, trains-agent 0.15.0

5 years ago
0 Hi There, I Used

The task object

3 years ago
0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

ha sorry it’s actually the number of shards that increased

4 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi @<1523701205467926528:profile|AgitatedDove14> , I want to circule back on this issue. This is still relevant and I could collect the following on an ec2 instance running a clearml-agent running a stuck task:

  • There seems to be a problem with multiprocessing: Although I stopped the task, there are still so many processes forked from the main training process. I guess these are zombies. Please check the htop tree.
  • There is a memory leak somewhere, please see the screenshot of datadog mem...
2 years ago
0 Hi, Although

Does that mean that agents do not read this parameter?

4 years ago
0 Hey, What Is The Exact Difference Between

I hitted enter too fast ^^
Installing them globally via
$ pip install numpy opencv torch will install locally with warning:
Defaulting to user installation because normal site-packages is not writeable , therefore the installation will take place in ~/.local/lib/python3.6/site-packages , instead of the default one. Will this still be considered as global site-packages and still be included in experiments envs? From what I tested it does

5 years ago
0 Hi There, I Have A Problem With Pyjwt: I Am Using

You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good πŸ˜„

4 years ago
0 Hi, If I Am Starting My Training With The Following Command:

ok, so even if that guy is attached, it doesn’t report the scalars

3 years ago
4 years ago
0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

yes, in the code, i do:
task._wait_for_repo_detection() REQS_TASK = ["torch==1.3.1", "pytorch-ignite @ git+ ", "."] task._update_requirements(REQS_TASK) task.execute_remotely(queue_name=args.queue, clone=False, exit_process=True)

4 years ago
0 Hi There

Please wait a few mins - This example is not valid, I will share a new one soon

5 years ago
0 Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

Notice the last line should not have

--docker

Did you meant --detached ?

I also think we need to make sure we monitor all agents (this is important as this is the trigger to spin down the instance)

That's what I though yea, no problem, it was rather a question, if I encounter the need for that, I will adapt and open a PR πŸ™‚

4 years ago
0 Hi, Although

so the task they execute must have clearml installed?

4 years ago
0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

AgitatedDove14 The first time it installs and create the cache for the env, the second time it fails with:
Applying uncommitted changes ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. clearml_agent: ERROR: Command '['/home/user/.clearml/venvs-builds.1/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsmncaxx45.txt']' returned non-zero exit status 1.

4 years ago
Show more results compactanswers