Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
1 Answers
965 Views
0 Votes 1 Answers 965 Views
Hi there, is it safe to use ClearML (trains >= 0.17) with the trains ignite handler? Should we wait for the update on their side?
3 years ago
0 Votes
12 Answers
926 Views
0 Votes 12 Answers 926 Views
Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)
4 years ago
0 Votes
2 Answers
923 Views
0 Votes 2 Answers 923 Views
Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...
3 years ago
0 Votes
2 Answers
926 Views
0 Votes 2 Answers 926 Views
3 years ago
0 Votes
3 Answers
961 Views
0 Votes 3 Answers 961 Views
Hi ClearML team members! Is there any progress made on the clearml-serving repo? I’d love to start using it but I lack a straightforward get started example....
3 years ago
0 Votes
5 Answers
979 Views
0 Votes 5 Answers 979 Views
Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi, I would like to report something else weird in the clearml-agent 1.5.1 running in docker mode: In the logs, when it dumps its config, it writes: docker_c...
one year ago
0 Votes
15 Answers
1K Views
0 Votes 15 Answers 1K Views
Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...
3 years ago
0 Votes
4 Answers
950 Views
0 Votes 4 Answers 950 Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
4 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi, I am trying to update the aws_autoscaler to the latest version on the master branch. I simply changed the commit id in the experiment and run it, this ga...
3 years ago
0 Votes
5 Answers
941 Views
0 Votes 5 Answers 941 Views
Hi guys, I would like to start using the AWS autoscaler shipped in trains. I need to create a IAM user to get and I would like to know what are the minimal p...
4 years ago
0 Votes
30 Answers
978 Views
0 Votes 30 Answers 978 Views
Could you please explain a bit more how trains adapt the torch version depending on the installed cuda version? Here is my setup: cuda 102 installed and corr...
4 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
2 years ago
0 Votes
1 Answers
976 Views
0 Votes 1 Answers 976 Views
Hi there, I moved my ClearML server from US to EU and now I am trying to setup the AWS autoscaler with the different architecture that I have now. So far I u...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?
3 years ago
0 Votes
3 Answers
977 Views
0 Votes 3 Answers 977 Views
hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...
3 years ago
0 Votes
1 Answers
966 Views
0 Votes 1 Answers 966 Views
Hi, I have a clearml-agent (1.1.2) in a g4dn.4xlarge AWS instance (with one T4 GPU), that reports agent.cuda_version = 0 agent.cudnn_version = 0and does not ...
2 years ago
0 Votes
14 Answers
1K Views
0 Votes 14 Answers 1K Views
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
4 years ago
0 Votes
1 Answers
900 Views
0 Votes 1 Answers 900 Views
Hey there πŸ™‚ Would in the WebUI, on an experiment CONFIGURATION tab, for a specific parameter, would it be possible not show its value as a single string whe...
2 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hi, on clearml-server 1.5.0, in scalar graphs, the new default value is “Show closest data on hover”. Would it be possible to make it automatically set to “C...
2 years ago
0 Votes
10 Answers
882 Views
0 Votes 10 Answers 882 Views
Hi, just want to report a small bug in the clearml dashboard: after queuing an experiment, if I change the experiment queue, then go back to the experiment I...
3 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
3 years ago
0 Votes
4 Answers
967 Views
0 Votes 4 Answers 967 Views
Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...
4 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Hi, some properties of the Task object are not listed in the documentation (such as task.parent, which is not clear whether it is the parent task object itse...
4 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...
4 years ago
0 Votes
18 Answers
975 Views
0 Votes 18 Answers 975 Views
Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi there, congrats for releasing v1 πŸ˜„ I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...
3 years ago
0 Votes
5 Answers
936 Views
0 Votes 5 Answers 936 Views
Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...
3 years ago
Show more results questions
0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

trains-agent-1: runs an experiment for a long time (>12h). Picks a new experiment on top of the long one running trains-agent-2: runs only one experiment at a time, normal trains-agent-3: runs only one experiment at a time, normalIn total: 4 experiments running for 3 agents

4 years ago
0 Hi There, I Recently Updated Clearml Server To 1.7.0, And Found The Following

Hey @<1523701205467926528:profile|AgitatedDove14> , Actually I just realised that I was confused by the fact that when the task is reset, because of the sorting it disappears, making it seem like it was deleted. I think it's a UX issue: When I click on reset.

  • The pop shows "Deleting 100%"
  • The task disappears in the list of tasks because of the sortingThis led me to thing that there was a bug and the task was deleted
2 years ago
0 Hi There,

Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why

one year ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server

2 years ago
0 Hi, Together With

Alright, experiment finished properly (all models uploaded). I will restart it to check again, but seems like the bug was introduced after that

4 years ago
0 Hi There

As to why: This is part of the piping that I described in a previous message: Task B requires an artifact from task A, so I pass the name of the artifact as a parameter of task B, so that B knows what artifact from A it should retrieve

4 years ago
0 Hi There

Please wait a few mins - This example is not valid, I will share a new one soon

4 years ago
0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Thanks SuccessfulKoala55 !
Maybe you could add to your docker-compose file an option for limiting the size of the logs, since there is no limit by default, their size will grow for ever, which doesn't sound ideal https://docs.docker.com/compose/compose-file/#logging

4 years ago
2 years ago
0 Hi There

More context:
trains, trains-agent and trains-server all 0.16 Session.api_version -> 2.9 (both when executed in trains-agent and in local script)

4 years ago
0 Hi There

I just read, I do have the trains version 0.16 and the experiment is created with that version

4 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

You mean you "aborted the task" from the UI?

Yes exactly

I'm assuming from the leftover processes ?

Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

yes in venv mode, I'll try with the latest version as well

one year ago
0 Hello, In The Following Context:

Downloading the artifacts is done only when actually calling get()/get_local_copy()

Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts πŸ™‚

4 years ago
0 Hello, In The Following Context:

That said, you might have accessed the artifacts before any of them were registered

I called task.wait_for_status() to make sure the task is done

4 years ago
0 Hi There

Yes, in the Task being executed in the agents, I have:
from trains import Task task = Task.init(...) task.get_logger().report_text(str(task.get_parameters()))

4 years ago
0 Hi, Together With

Exactly

4 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi @<1523701205467926528:profile|AgitatedDove14> , I want to circule back on this issue. This is still relevant and I could collect the following on an ec2 instance running a clearml-agent running a stuck task:

  • There seems to be a problem with multiprocessing: Although I stopped the task, there are still so many processes forked from the main training process. I guess these are zombies. Please check the htop tree.
  • There is a memory leak somewhere, please see the screenshot of datadog mem...
one year ago
0 Hi, I Have Another Problem

python3 -m trains_agent --config-file "~/trains.conf" daemon --queue default --log-level DEBUG --detached --gpus 1 > ~/trains-agent.startup.log 2>&1

4 years ago
0 Hi, I Have Another Problem

AgitatedDove14 one last question: how can I enforce a specific wheel to be installed?

4 years ago
0 Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

Hi SuccessfulKoala55 , thanks for the idea! the function isn’t called with atexit.register() though, maybe the way the agent kills the task is not supported by atexit

2 years ago
0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Yes it would be very valuable to be able to tweak that param, currently it's quite annoying because it's set to 30 mins, so when a worker is killed by the autoscaler, I have to wait 30 mins before the autoscaler spins up a new machine because the autoscaler thinks there is already enough agents available, while in reality the agent is down

one year ago
0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Thanks for the hack! The use case is the following: I have a controler that creates training/validation/testing tasks by cloning (so that the parent task id is properly set to the controler). Otherwise I could simply create these tasks with Task.init, but then I would need to set manually the parent task for each one of these tasks, probably with a similar hack, right?

4 years ago
Show more results compactanswers