Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...
4 years ago
0 Votes
19 Answers
2K Views
0 Votes 19 Answers 2K Views
2 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...
4 years ago
0 Votes
9 Answers
2K Views
0 Votes 9 Answers 2K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I have an error with clearml-agent 1.5.1 when importing tensorflow 2.10 from tensorflow.python.client._pywrap_tf_session import * File "/root/.clearml/ve...
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, I am trying to update the aws_autoscaler to the latest version on the master branch. I simply changed the commit id in the experiment and run it, this ga...
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...
4 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...
3 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
5 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi, I have a local package that I use to train my models. To start training, I have a script that calls task._update_requirements([".", "torch==1.11.0"]) . I...
3 years ago
0 Votes
26 Answers
2K Views
0 Votes 26 Answers 2K Views
Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, how can I search an old experiment based on its commit hash?
2 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...
3 years ago
0 Votes
25 Answers
2K Views
0 Votes 25 Answers 2K Views
Hi, I have another problem ๐Ÿ˜… in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
5 years ago
0 Votes
30 Answers
3K Views
0 Votes 30 Answers 3K Views
Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...
3 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...
4 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
3 years ago
0 Votes
29 Answers
2K Views
0 Votes 29 Answers 2K Views
Hi, although https://github.com/allegroai/clearml/issues/181 is resolved, clearml-agent (0.17.2) still logs tqdm iterations as different lines, is there some...
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hey there again, I am not sure to understand what is the difference between StorageManager and StorageHelper and which one to use?
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, I am considering making automated backups of my clearml-server using Amazon EBS snapshots. Should I be concerned with the same problem described here > h...
4 years ago
0 Votes
19 Answers
2K Views
0 Votes 19 Answers 2K Views
I guess one experiment is running backwards in time ๐Ÿ˜„
3 years ago
0 Votes
17 Answers
2K Views
0 Votes 17 Answers 2K Views
Hi, I updated to clearml-server 1.4.0 and I am uncomfortable with the new Table/Detail view, is there a way to disable it and use the previous one (on click ...
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...
4 years ago
0 Votes
13 Answers
3K Views
0 Votes 13 Answers 3K Views
Hi, I am trying to use the clearml-agent in docker mode to run an experiment, but it seems to fail passing the clearml.conf file to the docker container: Exe...
2 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...
5 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hey there ๐Ÿ™‚ Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I am using clearml with pytorch-ignite and its EarlyStopping handler. I would like to log the counter of the patience of this handler, how can I do that?
4 years ago
0 Votes
23 Answers
2K Views
0 Votes 23 Answers 2K Views
Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...
5 years ago
Show more results questions
0 Hi Guys, Any Plan To Integrate The

Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now

5 years ago
0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

There is no need to add creds on the machine, since the EC2 instance has an attached IAM profile that grants access to s3. Boto3 is able retrieve the files from the s3 bucket

4 years ago
0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...

4 years ago
0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

The clean up service is awesome, but it would require to have another agent running in services mode in the same machine, which I would rather avoid

4 years ago
0 Hi, Together With

(It would be nice to have all the Pypi releases tagged in github btw)

5 years ago
0 Hey, Clearml Team! When Can We Expect An Updated Roadmap? Last One Is From August

automatically promote models to be served from within clearml

Yes!

4 years ago
0 I Guess One Experiment Is Running Backwards In Time

I hit F12 to check projects.get_all_ex but nothing is fired, I guess the web ui is just frozen in some weird state

3 years ago
4 years ago
0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

SuccessfulKoala55 I found the issue thanks to you: I changed a bit the domain but didnโ€™t update the apiserver.auth.cookies.domain setting - I did it, restarted and now it works ๐Ÿ™‚ Thanks!

4 years ago
0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...

4 years ago
3 years ago
0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)

2 years ago
0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described ๐Ÿ‘

5 years ago
0 Hi There, I Have A Problem With Pyjwt: I Am Using

but the post_packages does not reinstalls the version 1.7.1

4 years ago
0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

In execution tab, I see old commit, in logs, I see an empty branch and the old commit

5 years ago
0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , so I ran 3 experiments:
One with my current implementation (using "fork") One using "forkserver" One using "forkserver" + the DataLoader optimizationI sent you the results via MP, here are the outcomes:
fork -> 101 mins, low RAM usage (5Go constant), almost no IO forkserver -> 123 mins, high RAM usage (16Go, fluctuations), high IO forkserver + DataLoader optimization: 105 mins, high RAM usage (from 28Go to 16Go), high IO
CPU/GPU curves are the same for the 3 experiments...

4 years ago
0 Hi, If I Am Starting My Training With The Following Command:

I opened an https://github.com/pytorch/ignite/issues/2343 in igniteโ€™s repo and a https://github.com/pytorch/ignite/pull/2344 , could you please have a look? There might be a bug in clearml Task.init in distributed envs

3 years ago
0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only

4 years ago
0 Hi, If I Am Starting My Training With The Following Command:

And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that

3 years ago
0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 yes! I now realise that the ignite events callbacks seem to not be fired (I tried to print a debug message on a custom Events.ITERATION_COMPLETED) and I cannot see it logged

3 years ago
0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?

4 years ago
0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago
0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

I now have a different question: when installing torch from wheels files, I am guaranteed to have the corresponding cuda library and cudnn together right?

5 years ago
Show more results compactanswers