Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
3 years ago
0 Votes
26 Answers
1K Views
0 Votes 26 Answers 1K Views
Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...
2 years ago
0 Votes
7 Answers
976 Views
0 Votes 7 Answers 976 Views
Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, is it possible to specify the required version of python for a Task that is different from the python running the clearml-agent? Example: my clearml-agen...
2 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...
3 years ago
0 Votes
13 Answers
982 Views
0 Votes 13 Answers 982 Views
Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, I think I found a small bug: Clone an experiment Enqueue it on a queue with no workers Delete the queue Try to Dequeue the experimentThe last operation w...
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...
4 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, I see that there is a new parameter in aws autoscaler: max_spin_up_time_min - What is the difference with max_idle_time_min ?
aws
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi, I have an error with clearml-agent 1.5.1 when importing tensorflow 2.10 from tensorflow.python.client._pywrap_tf_session import * File "/root/.clearml/ve...
one year ago
0 Votes
7 Answers
997 Views
0 Votes 7 Answers 997 Views
Hi, one more question: When creating a task with Task.init(), we can specify the task_type . Now when using Task.clone(), we cannot specify the task_type (is...
4 years ago
0 Votes
3 Answers
927 Views
0 Votes 3 Answers 927 Views
Hi, I am getting an error while running task.mark_stopped() , any idea why? (clearml 1.0.2, clearml-agent 1.0.0, python 3.6) File "/home/machine/.clearml/ven...
3 years ago
0 Votes
3 Answers
988 Views
0 Votes 3 Answers 988 Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Are the various task types available in 0.15? I am getting > 2020-06-09 12:58:53,287 - trains.Task - WARNING - Retrying, previous request failed : 'custom' i...
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
2 years ago
0 Votes
18 Answers
959 Views
0 Votes 18 Answers 959 Views
Hi Guys, I had several times now the following errors poping in agents while executing a task: trains_agent: ERROR: Failed applying git diff: I attached the ...
4 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
2 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
4 years ago
0 Votes
0 Answers
1K Views
0 Votes 0 Answers 1K Views
Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
2 years ago
0 Votes
3 Answers
968 Views
0 Votes 3 Answers 968 Views
Hey! Would it be possible to tag the RC releases in the different repos? So that one knows what is inside?
4 years ago
0 Votes
4 Answers
958 Views
0 Votes 4 Answers 958 Views
Hey there, happy new year to all of you 🍾 I have several tasks that are stuck while training a model with pytorch/ignite, more precisely right after uploadi...
3 years ago
0 Votes
1 Answers
933 Views
0 Votes 1 Answers 933 Views
Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...
4 years ago
0 Votes
1 Answers
993 Views
0 Votes 1 Answers 993 Views
Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...
3 years ago
0 Votes
11 Answers
975 Views
0 Votes 11 Answers 975 Views
Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...
4 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
2 years ago
0 Votes
26 Answers
1K Views
0 Votes 26 Answers 1K Views
Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...
aws
3 years ago
0 Votes
1 Answers
910 Views
0 Votes 1 Answers 910 Views
Hi there, would it be possible to add some Neural Architecture Search example, as for the HyperParameter Optimizer examples?
3 years ago
Show more results questions
0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described 👍

4 years ago
0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

The task requires this service, so the task starts it on the machine - Then I want to make sure the service is closed by the task upon completion/failure/abortion

4 years ago
0 Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

as it's also based on pytorch-ignite!

I am not sure to understand, what is the link with pytorch-ignite?

We're in the brainstorming phase of what are the best approaches to integrate, we might pick your brain later on

Awesome, I'd be happy to help!

3 years ago
0 Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hi AgitatedDove14 , coming by after a few experiments this morning:
Indeed torch 1.3.1 does not support cuda, I tried with 1.7.0 and it worked, BUT trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101. But in the experiment log, the agent correctly reported the cuda version (111). I then replaced the torch==1.7.0 with the direct https link to the torch wheel for cuda 110, and that worked (I also tried specifyin...

4 years ago
0 Hey Again

Very cool! Run two train-agent daemons, one per GPU on the same machine, with default Nvidia/CUDA Docker This is close to my use case, I just would like to run these two daemons not with docker, would that be possible? I should just remove the --docker nvidia/cuda param right?

4 years ago
0 Hey Again

trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 1 --queue default &

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Ho the object is actually available in previous_task.artifacts

4 years ago
0 Hi, Did Anyone Experiment With Running On The Aws Autoscaler On Spots And Knows Whether There Is Configuration For Retry Policy When Spot Get Evacuated Mid-Job?

Hi there, yes I was able to make it work with some glue code:
Save your model, optimizer, scheduler every epoch Have a separate thread that periodically pulls the instance metadata and check if the instance is marked for stop, in this case, add a custom tag eg. TO_RESUME Have a services that periodically pulls failed experiments from the queue with the tag TO_RESUME, force marking them as stopped instead of failed and reschedule them with as extra-param the last checkpoint

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

 you mean “docker” was not installed and it did not throw an error ?

Yes docker was not installed in the machine

Yes you must make sure the docker can mount a persistent folder for you to work on.

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

So I installed docker, added user to group allowed to run docker (not to have to run with sudo, otherwise it fails), then ran these two commands and it worked

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

I got some progress TimelyPenguin76 , Now the task runs and I get the error from docker:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

So that I don’t loose what I worked on when stopping the session, and if I need to, I can ssh to the machine and directly access the content inside the user folder

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Alright I have a followup question then: I used the param --user-folder “~/projects/my-project”, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?

3 years ago
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

` Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '--network', 'host', '-e', 'CLEARML_WORKER_ID=office:worker-0:docker', '-e', 'CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04 --network host', '-v', '/home/user/.gitconfig:/root/.gitconfig', '-v', '/tmp/.clearml_agent.toc3_yks.cfg:/root/clearml.conf', '-v', '/tmp/clearml_agent.ssh.1dsz4bz8:/root/.ssh', '-v', '/home/user/.clearml/apt-cache.2:/var/cache/apt/archives', '-v', '/home/user/.clearml/pip-cache:/root/.cache/pip', '...

3 years ago
0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

yes, here is the error (the space at the end of the line is there)
` Applying uncommitted changes
Executing: ('git', 'apply'): b'error: corrupt patch at line 13\n'
Failed applying diff
trains_agent: ERROR: Failed applying git diff:
diff --git a/configs/2.2.2_from_scratch.yaml b/configs/2.2.2_from_scratch.yaml
index 9fece48..5816f78 100644
--- a/configs/2.2.2_from_scratch.yaml
+++ b/configs/2.2.2_from_scratch.yaml
@@ -136,7 +136,7 @@ data_processing:
optimizer:
type: 'RMSprop'
args:

  • lr: 2.5e...
4 years ago
Show more results compactanswers