Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
⚠️ Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😡 ,...
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi there, congrats for releasing v1 πŸ˜„ I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
4 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
3 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hey there, since which version, clearml stops connecting to the demo server by default?
4 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...
5 years ago
0 Votes
23 Answers
2K Views
0 Votes 23 Answers 2K Views
Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)
2 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...
4 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
Hi, I deleted some archived experiments in clearml server 1.0 and the popup in the dashboard showed “the following artifacts were not deleted”, with a list o...
4 years ago
0 Votes
18 Answers
2K Views
0 Votes 18 Answers 2K Views
Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...
4 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?
4 years ago
0 Votes
26 Answers
2K Views
0 Votes 26 Answers 2K Views
Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...
aws
4 years ago
0 Votes
25 Answers
2K Views
0 Votes 25 Answers 2K Views
Hi, I have another problem πŸ˜… in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
5 years ago
0 Votes
16 Answers
2K Views
0 Votes 16 Answers 2K Views
Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...
5 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...
4 years ago
0 Votes
26 Answers
2K Views
0 Votes 26 Answers 2K Views
Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...
3 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
The “Manage queue” option in the right tab on a queued experiment is broken in v1.0 (it does nothing)
4 years ago
0 Votes
11 Answers
2K Views
0 Votes 11 Answers 2K Views
Hi, I have a question regarding the aws-autoscaler: am I understanding correctly that: max_idle_time_min=5 max_spin_up_time_min=10 polling_interval_time_min=...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hey guys, quick question: is there a tool function to know if a task id is valid? Not verifying that the task itself exists, just that the task id is the cor...
5 years ago
0 Votes
27 Answers
2K Views
0 Votes 27 Answers 2K Views
Hi there, I found a memory leak in Logger.report_matplotlib_figure . I was constantly running out of memory when training my models so I decided to spend som...
2 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...
4 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)
4 years ago
0 Votes
6 Answers
2K Views
0 Votes 6 Answers 2K Views
Hi there, maybe this was already asked but I don't remember: Would it be possible to have the clearml-agent switch between docker mode and virtualenv mode at...
2 years ago
0 Votes
9 Answers
2K Views
0 Votes 9 Answers 2K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
4 years ago
0 Votes
0 Answers
2K Views
0 Votes 0 Answers 2K Views
Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...
4 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...
4 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
5 years ago
Show more results questions
0 Hi, I Deleted Some Archived Experiments In Clearml Server 1.0 And The Popup In The Dashboard Showed “The Following Artifacts Were Not Deleted”, With A List Of Files That Are Under

SuccessfulKoala55 They do have the right filepath, eg:
https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png

4 years ago
0 Hi, I Would Like To Follow-Up In This

Ok AgitatedDove14 SuccessfulKoala55 I made some progress in my investigation:
I can exactly pinpoint the change that introduced the bug, it is the one changing the endpoint "events.get_task_log", min_version="2.9"
In the firefox console > Network, I can edit an events.get_task_log and change the URL from …/api/v2.9/events.get_task_log to …/api/v2.8/events.get_task_log (to use the endpoint "events.get_task_log", min_version="1.7" ) and then all the logs are ...

3 years ago
0 Hi, Coming Back With The Venv Caching: With The Following Setting:

yes, in setup.py I have:
..., install_requires= [ "my-private-dep @ git+ ", ... ], ...

4 years ago
0 Hi There,

Hi @<1523701205467926528:profile|AgitatedDove14> @<1537605940121964544:profile|EnthusiasticShrimp49> , the issue above seemed to be the memory leak and it looks like there is no problem from clearml side.
I trained successfully without mem leak with num_workers=0 and I am now testing with num_workers=8.
Sorry for the false positive :man-bowing:

2 years ago
0 Hi, Coming Back With The Venv Caching: With The Following Setting:

ok, so there is no way to cache it and detect when the ref changes?

4 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

even if I move the Github workers internally where they could have access to the prod server, I am not sure I would like that, because it would pile up test data in the prod server that is not necessary

3 years ago
0 Hi There, I Have A Problem With Pyjwt: I Am Using

I can ssh into the agent and:
source /trains-agent-venv/bin/activate (trains_agent_venv) pip show pyjwt Version: 1.7.1

4 years ago
0 Hi, I Would Like To Bring Awareness

πŸš€ Thanks @<1523701205467926528:profile|AgitatedDove14> !

2 years ago
0 Hey, What Is The Exact Difference Between

AgitatedDove14 I now tested with a real experiment, it works, but I saw two issues:
It first doesnt detect torch, downloads it but then says that it is already installed so it doesn't install it. One of the dependency of my repository is another repository (repo-2 in the logs). Both my repositories require numpy . When installing the first repository, it says Requirement already satisfied: numpy in /home/workeruser/.local/lib/python3.6/site-packages . Correct. But then it says `...

5 years ago
0 Hi There, I Have Several Experiments Hanging/Stuck In The Middle Or At The End Of The Training, With The Last Message Logged Being:

Hi @<1523701087100473344:profile|SuccessfulKoala55> I was able to find the issue, I was creating a queue and worker subprocess that were not properly cleaned up

one year ago
0 Hi, If I Am Starting My Training With The Following Command:

For the moment this is what I would be inclined to believe

3 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I am not using hydra, I am reading the conf with:
config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))

3 years ago
0 Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

Also I can simply delete the /elastic_7 folder, I don’t use it anymore (I have a remote ES cluster). In that case, I guess I would have enough space?

3 years ago
0 I Guess One Experiment Is Running Backwards In Time

Sorry, I refreshed the page and it’s gone πŸ˜…

3 years ago
0 Hello, I Have Some Problems With Allegro. I Run A Programm And Then I Saw It On The Trains Server. But Now I Change Something With The Code And I Pushed It Again. Now I Cloned It. But The Old Code Was Executed. How Can I Run The New Code I Pushed?

On the cloned experiment, which by default is created in draft mode, you can change the commit to point either a specific commit or the latest commit of the branch

4 years ago
0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

2 years ago
0 Hi, I Would Like To Bring Awareness

I wouldn't do it, this is less code to maintain from your side and honestly too much auto magic makes it difficult for the user to control the environment (ie. to understand what happens behind the scenes). I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible

2 years ago
0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0) call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?

4 years ago
Show more results compactanswers