Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 2 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
13 Answers
631 Views
0 Votes 13 Answers 631 Views
Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...
3 years ago
0 Votes
22 Answers
659 Views
0 Votes 22 Answers 659 Views
Hi there, I used clearml-task to send a script to be executed remotely. When being executed remotely Task.current_task() returns None, how should I get the c...
2 years ago
0 Votes
6 Answers
648 Views
0 Votes 6 Answers 648 Views
Hi, Is there a way to stop a clearml-agent from within an experiment? Or block it to prevent it running any other task?
3 years ago
0 Votes
13 Answers
765 Views
0 Votes 13 Answers 765 Views
Hey there, Is it possible for a clearml pipeline step to log a folder instead of numpy/pickle objects? Looking at the docs, monitor_artifacts could be what I...
2 years ago
0 Votes
0 Answers
641 Views
0 Votes 0 Answers 641 Views
(sorry I pinned the message accidentally ๐Ÿ˜… )
3 years ago
0 Votes
6 Answers
703 Views
0 Votes 6 Answers 703 Views
Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...
one year ago
0 Votes
26 Answers
824 Views
0 Votes 26 Answers 824 Views
Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...
aws
2 years ago
0 Votes
4 Answers
717 Views
0 Votes 4 Answers 717 Views
2 years ago
0 Votes
10 Answers
734 Views
0 Votes 10 Answers 734 Views
3 years ago
0 Votes
2 Answers
720 Views
0 Votes 2 Answers 720 Views
3 years ago
0 Votes
4 Answers
651 Views
0 Votes 4 Answers 651 Views
Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...
3 years ago
0 Votes
4 Answers
669 Views
0 Votes 4 Answers 669 Views
Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...
3 years ago
0 Votes
17 Answers
701 Views
0 Votes 17 Answers 701 Views
3 years ago
0 Votes
2 Answers
622 Views
0 Votes 2 Answers 622 Views
Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...
3 years ago
0 Votes
2 Answers
744 Views
0 Votes 2 Answers 744 Views
Hi, I recently updated my clearml to 1.1.2 and a code that was working before now behaves completely differently: I am using the following to log debug sampl...
2 years ago
0 Votes
10 Answers
736 Views
0 Votes 10 Answers 736 Views
Hi, on clearml-server 1.5.0, in scalar graphs, the new default value is “Show closest data on hover”. Would it be possible to make it automatically set to “C...
2 years ago
0 Votes
4 Answers
665 Views
0 Votes 4 Answers 665 Views
Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...
3 years ago
0 Votes
3 Answers
651 Views
0 Votes 3 Answers 651 Views
Hi quick question: does Task.connect_configuration support OmegaConf DictConfig objects? ie. Can I do: config = train_task.connect_configuration(OmegaConf.lo...
2 years ago
0 Votes
5 Answers
617 Views
0 Votes 5 Answers 617 Views
Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...
2 years ago
0 Votes
9 Answers
667 Views
0 Votes 9 Answers 667 Views
Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...
2 years ago
0 Votes
1 Answers
584 Views
0 Votes 1 Answers 584 Views
Hey there ๐Ÿ™‚ Would in the WebUI, on an experiment CONFIGURATION tab, for a specific parameter, would it be possible not show its value as a single string whe...
2 years ago
0 Votes
11 Answers
684 Views
0 Votes 11 Answers 684 Views
Are the various task types available in 0.15? I am getting > 2020-06-09 12:58:53,287 - trains.Task - WARNING - Retrying, previous request failed : 'custom' i...
4 years ago
0 Votes
2 Answers
621 Views
0 Votes 2 Answers 621 Views
Hey there! I would like to use the function task.set_project in the following way: my_task.set_project("Top level project/second level project") `` Top level...
one year ago
0 Votes
8 Answers
742 Views
0 Votes 8 Answers 742 Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
3 years ago
0 Votes
5 Answers
778 Views
0 Votes 5 Answers 778 Views
Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...
2 years ago
0 Votes
5 Answers
621 Views
0 Votes 5 Answers 621 Views
Hi, I am using clearml with pytorch-ignite and its EarlyStopping handler. I would like to log the counter of the patience of this handler, how can I do that?
2 years ago
0 Votes
5 Answers
611 Views
0 Votes 5 Answers 611 Views
Hi, is it possible to disable some of the system metrics monitored? and also downsample the rate of logging?
3 years ago
0 Votes
12 Answers
760 Views
0 Votes 12 Answers 760 Views
2 years ago
0 Votes
3 Answers
648 Views
0 Votes 3 Answers 648 Views
Hey! Would it be possible to tag the RC releases in the different repos? So that one knows what is inside?
4 years ago
0 Votes
1 Answers
694 Views
0 Votes 1 Answers 694 Views
Hi, there is a small bug with auto-refreshing in the DEBUG SAMPLES Tab of the Web UI: If it is ON, then it will always force the first series to be displayed...
2 years ago
Show more results questions
0 Hi There,

I think that somehow somewhere a reference to the figure is still living, so plt.close("all") and gc cannot free the figure and it ends up accumulating. I don't know where yet

one year ago
0 Hi There,

This is what I get with mprof on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)
image

one year ago
0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

2 years ago
0 Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

yea I just realized that you would also need to specify different subnets, etcโ€ฆ not sure how easy it is ๐Ÿ˜ž But it would be very valuable, on-demand GPU instances are so hard to spin up nowadays in aws ๐Ÿ˜„

2 years ago
0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

So if all artifacts are logged in the pipeline controller task, I need the last task to access all the artifacts from the pipeline task. I need to execute something like PipelineController.get_artifact() in the last step task

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 If I call explicitly task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0) , this will log as expected one value per process, so reporting works

2 years ago
0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Alright, so the steps would be:

trains-agent build --docker nvidia/cuda --id myTaskId --target base_env_services

That would create me a base docker image base_env_services . Then how should I ensure that trains-agent uses that base image for the services queue? My guess is:

trains-agent daemon --services-mode --detached --queue services --create-queue --docker base_env_services --cpu-only

Would that work?

4 years ago
0 Hey Guys, I Am Trying To Plan What I Need To Do In Order To Efficiently Use Clearml With Spot Instances 1) Detecting When Spot Instance Is Down And Experiment Is Aborted 2) Extracting S3 Address Of The Latest Checkpoint From Clearml Api 3) Starting New E

Hi DilapidatedDucks58 , I did that already, but I am reusing the same experiment instead of merging two experiments. Step 4 can be seen as:
Update the experiment status to stopped (if it is failed, you wonโ€™t be able to re-enqueue it) Set a parameter of that task to point to the latest checkpoint and load it (you can also infer it directy: I simply add a tag to the task resume , and check at runtime if this tag exists, if yes, I fetch the latest checkpoint of the task) Use https://clea...

2 years ago
0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I killed both trains-agent and restarted one to have a clean start. This way it correctly spin up docker containers for services tasks. So probably the bug comes when a bug occurs while setting up a task, it cannot go back to the main task. I would need to do some tests to validate that hypothesis though

4 years ago
0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Alright, I had a look in the /tmp/.trains_agent_daemon_outabcdef.txt logs, not many insights from here. For the moment, I simply started a new trains-agent daemon in services mode and I will wait to see what happens.

4 years ago
0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Thanks for the hack! The use case is the following: I have a controler that creates training/validation/testing tasks by cloning (so that the parent task id is properly set to the controler). Otherwise I could simply create these tasks with Task.init, but then I would need to set manually the parent task for each one of these tasks, probably with a similar hack, right?

4 years ago
0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

correct, you could also use

Task.create

that creates a Task but does not do any automagic.

Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

and saved locally, which is why the second task, not executed in the same machine, cannot access the file

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

So when I create a task using `task = Task.init(project_name=config.get("project_name"), task_name=config.get("task_name"), task_type=Task.TaskTypes.training, output_uri=" s3://my-bucket ") locally, the artifact is correctly logged remotely, but when I create the task remotely (from an agent) the artifact is logged locally (in the agent machine, not on s3)

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

It seems that around here, a Task that is created using init remotely in the main process gets its output_uri parameter ignored

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally

4 years ago
0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Setting it after the training correctly updated the task and I was able to store artifacts remotely

4 years ago
0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 I think itโ€™s on me to take the pytorch distributed example in the clearml repo and try to reproduce the bug, then pass it over to you ๐Ÿ™‚

2 years ago
0 Hi There,

Disclaimer: I didn't check this will reproduce the bug, but that's all the components that should reproduce it: a for loop creating figures and clearml logging them

one year ago
0 Hey There, Does Trains Support

No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?

4 years ago
0 Hi There,

Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why

one year ago
0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 Same problem with clearml==1.1.5rc2 ๐Ÿ˜ž , I also tried with backend==gloo , still same problem

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

The main issue is the task_logger.report_scalar() not reporting the scalars

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

ok, so even if that guy is attached, it doesnโ€™t report the scalars

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get igniteโ€™s events fired, but still no scalars logged ๐Ÿ˜ž
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

2 years ago
0 Hi, If I Am Starting My Training With The Following Command:

And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that

2 years ago
Show more results compactanswers