Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
1 Answers
901 Views
0 Votes 1 Answers 901 Views
Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? 😄
3 years ago
0 Votes
4 Answers
983 Views
0 Votes 4 Answers 983 Views
Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...
4 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
one year ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hey there, Does trains support clicks ? (entry points defined with that library)
4 years ago
0 Votes
14 Answers
1K Views
0 Votes 14 Answers 1K Views
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
3 years ago
0 Votes
1 Answers
971 Views
0 Votes 1 Answers 971 Views
Hi there, is it safe to use ClearML (trains >= 0.17) with the trains ignite handler? Should we wait for the update on their side?
3 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, is it possible to specify the required version of python for a Task that is different from the python running the clearml-agent? Example: my clearml-agen...
2 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Quick question: How can I clone a task and change the cloned task type? I see no Task.set_type() function
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
3 years ago
0 Votes
7 Answers
978 Views
0 Votes 7 Answers 978 Views
Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...
3 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
3 years ago
0 Votes
5 Answers
976 Views
0 Votes 5 Answers 976 Views
Hi there! I have a question regarding s3 access: I created a s3 user with read/write access but not delete, and trains seems to requires delete permissions (...
4 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
2 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...
3 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
3 years ago
0 Votes
3 Answers
991 Views
0 Votes 3 Answers 991 Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...
3 years ago
0 Votes
2 Answers
946 Views
0 Votes 2 Answers 946 Views
Hi guys; another idea: would be very cool to have a mattermost alert (monitor task), just like the one for Slack. Have a nice week-end all 👋
3 years ago
0 Votes
4 Answers
955 Views
0 Votes 4 Answers 955 Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hey there again, I am not sure to understand what is the difference between StorageManager and StorageHelper and which one to use?
4 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...
4 years ago
0 Votes
4 Answers
976 Views
0 Votes 4 Answers 976 Views
Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...
4 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...
4 years ago
0 Votes
0 Answers
1K Views
0 Votes 0 Answers 1K Views
Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...
4 years ago
0 Votes
13 Answers
998 Views
0 Votes 13 Answers 998 Views
4 years ago
0 Votes
5 Answers
947 Views
0 Votes 5 Answers 947 Views
Hi guys, I would like to start using the AWS autoscaler shipped in trains. I need to create a IAM user to get and I would like to know what are the minimal p...
4 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
3 years ago
Show more results questions
3 years ago
0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

I’m not too fond of many user configurations, it’s confusing.

100% agree, nevertheless, how much is too many? Currently, there are only two settings in the user preferences category, so one more wouldn’t hurt?

however, clearml is open source, nothing stops you from adding the code and sending a PR

I’d be super happy to contribute yes! Nevertheless, I am not sure where to start: clearml-server repo? clearml-web repo?

2 years ago
0 Hi, I Have Another Problem

btw shoulnd't it be CUDA_VERSION=11.0 ?

4 years ago
0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Done! Also I tried to use git cache ( https://git-scm.com/docs/git-credential-cache ) as a workaround (hoping that the first time it clones the experiment repo, it caches the creds for the next times, but I then get a different error: fatal: unable to find a suitable socket path; use --socket )

4 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Yes that’s what I did initially, but eventually I decided that it’s too much complexity added for nothing really, I’d rather drop omegaconf and if one day clearml supports it out of the box take advantage of it

2 years ago
0 Hi, Although

TimelyPenguin76 clearml 0.17.5 and clearml-agent 0.17.2

3 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi AgitatedDove14 , sorry somehow this message got lost 😄
clearml version is the latest at the time, 1.7.1 Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10

2 years ago
0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?

3 years ago
0 Hey There, Happy New Year To All Of You

Hi AgitatedDove14 , thanks for the answer! I will try adding 'multiprocessing_context='forkserver' to the DataLoader. In the issue you linked, nirraviv mentionned that forkserver was slower and shared a link to another issue https://github.com/pytorch/pytorch/issues/15849#issuecomment-573921048 where someone implemented a fast variant of the DataLoader to overcome the speed problem.
Did you experiment any drop of performances using forkserver? If yes, did you test the variant suggested i...

3 years ago
0 Hi There,

Ok interestingly using matplotlib.use('agg') it doesn't leak (idea from here )
image

one year ago
0 Hi, I Would Like To Report Something Else Weird In The Clearml-Agent 1.5.1 Running In Docker Mode: In The Logs, When It Dumps Its Config, It Writes:

Hi SuccessfulKoala55 , not really wrong, rather I don't understand it, the docker image with the args after it

one year ago
0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

here is the function used to create the task:

` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...

4 years ago
0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

In execution tab, I see old commit, in logs, I see an empty branch and the old commit

4 years ago
0 Hey Again

Hi SuccessfulKoala55 , Can the new accounts (password-protected) have the same names?

4 years ago
0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

The jump in the loss when resuming at iteration 31 is probably another issue -> for now I can conclude that:
I need to set sdk.development.report_use_subprocess = false I need to call task.set_initial_iteration(0)

3 years ago
0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

well I still see some ES errors in the logs
` clearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not...

3 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Any chance this is reproducible ?

Unfortunately not at the moment, I could find a reproducible scenario. If I clone a task that was stuck and start it, it might not be stuck

How many processes do you see running (i.e. ps -Af | grep python) ?

I will check that when the next one will be blocked 👍

What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

I train with p...

2 years ago
Show more results compactanswers