Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
215 Questions, 1023 Answers
  Active since 10 January 2023
  Last activity 3 months ago

Reputation

0

Badges 1

981 × Eureka!
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 ...
4 years ago
0 Votes
16 Answers
2K Views
0 Votes 16 Answers 2K Views
Hi guys, coming this time to share an idea of a killer feature for ClearML 🚀 I am pretty sure you guys already heard of https://www.streamlit.io/ , which is...
4 years ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?
4 years ago
0 Votes
9 Answers
2K Views
0 Votes 9 Answers 2K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
4 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...
4 years ago
0 Votes
11 Answers
2K Views
0 Votes 11 Answers 2K Views
Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)
4 years ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
Hi, I have an error with clearml-agent 1.5.1 when importing tensorflow 2.10 from tensorflow.python.client._pywrap_tf_session import * File "/root/.clearml/ve...
2 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/
5 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Quick question: Why does clearml-server 1.15.0 api-server python package require ES 8.12.0 but the docker-compose references ES 7.17.18?
one year ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, I am trying to update the aws_autoscaler to the latest version on the master branch. I simply changed the commit id in the experiment and run it, this ga...
4 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...
5 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...
4 years ago
0 Votes
7 Answers
2K Views
0 Votes 7 Answers 2K Views
Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...
3 years ago
0 Votes
8 Answers
2K Views
0 Votes 8 Answers 2K Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
5 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...
5 years ago
0 Votes
10 Answers
2K Views
0 Votes 10 Answers 2K Views
Hi, I have a local package that I use to train my models. To start training, I have a script that calls task._update_requirements([".", "torch==1.11.0"]) . I...
3 years ago
0 Votes
26 Answers
2K Views
0 Votes 26 Answers 2K Views
Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...
3 years ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?
3 years ago
0 Votes
2 Answers
2K Views
0 Votes 2 Answers 2K Views
Hi, how can I search an old experiment based on its commit hash?
2 years ago
0 Votes
3 Answers
2K Views
0 Votes 3 Answers 2K Views
Hi, is clearml-server compatible with latest versions of ES ( > 7.6.2)?
4 years ago
0 Votes
30 Answers
2K Views
0 Votes 30 Answers 2K Views
Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...
3 years ago
0 Votes
25 Answers
2K Views
0 Votes 25 Answers 2K Views
Hi, I have another problem 😅 in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
5 years ago
0 Votes
16 Answers
2K Views
0 Votes 16 Answers 2K Views
Hello, ~3 months ago I created a trains-server in a machine with 30gb of disk space. Today I wasn't able to connect to trains-server, so I checked the server...
4 years ago
0 Votes
12 Answers
2K Views
0 Votes 12 Answers 2K Views
Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...
2 years ago
0 Votes
30 Answers
3K Views
0 Votes 30 Answers 3K Views
Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...
3 years ago
Show more results questions
0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

CostlyOstrich36 I updated both agents to 1.1.2 and still go the same problem unfortunately. Since I can download the full log file from the Web UI, I guess the agents are reporting correctly?
Could it be that the elasticsearch does not return all the requested logs when it is queried from the WebUI to display it in the console?
Now that I think about it, I remember that on the changelog of the clearml-server 1.2.0 the following is listed:
` Fix UI Workers & Queues and Experiment Table pages ...

3 years ago
0 Hi, I Am Currently Using

I can live with the current setup for now

2 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi AgitatedDove14 , sorry somehow this message got lost 😄
clearml version is the latest at the time, 1.7.1 Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10

3 years ago
0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 If I call explicitly task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0) , this will log as expected one value per process, so reporting works

3 years ago
0 Hello, In The Following Context:

Looking at the source code, it seems like I should do:
data_processing_task._artifact_manager.flush() to make sure to have the latest version of artifacts in the task, right?

5 years ago
0 Hi, I Would Like To Follow-Up In This

Ok AgitatedDove14 SuccessfulKoala55 I made some progress in my investigation:
I can exactly pinpoint the change that introduced the bug, it is the one changing the endpoint "events.get_task_log", min_version="2.9"
In the firefox console > Network, I can edit an events.get_task_log and change the URL from …/api/v2.9/events.get_task_log to …/api/v2.8/events.get_task_log (to use the endpoint "events.get_task_log", min_version="1.7" ) and then all the logs are ...

3 years ago
0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago
0 Hi, Together With

(It would be nice to have all the Pypi releases tagged in github btw)

5 years ago
5 years ago
5 years ago
4 years ago
0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Is it safe to turn off replication while a reindex operation is happening? the reindexing is rather slow and I am wondering if turning of replication will speed up the process

4 years ago
0 Hi There, I Have A Problem With Pyjwt: I Am Using

so most likely one hard requirement installs the version 2 of pyjwt while setting up the experiment

4 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf

5 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So most likely trains was masking the original error, it might be worth investigating to help other users in the future

5 years ago
0 Hi, Although

Nice! What is the default value?

4 years ago
0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

well I still see some ES errors in the logs
` clearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not...

4 years ago
0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

` resource_configurations {
A100 {
instance_type = "p3.2xlarge"
is_spot = false
availability_zone = "us-east-1b"
ami_id = "ami-04c0416d6bd8e4b1f"
ebs_device_name = "/dev/xvda"
ebs_volume_size = 100
ebs_volume_type = "gp3"
}
}

queues {
aws_a100 = [["A100", 15]]
}

extra_trains_conf = """
agent.package_manager.system_site_packages = true
agent.package_manager.pip_version = "==20.2.3"
"""

extra_vm_bash_script = """

sudo apt-get install -y libsm6 libxext6 libx...

4 years ago
0 Hi, I Would Like To Bring Awareness

I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)

That sounds great!

2 years ago
0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Probably 6. I think because of some reason, it did not go back to main trains-agent. Nevertheless I am not sure, because a second task could start. It could also be that the second was aborted for some reason while installing task requirements (not system requirements, so executing the trains-agent setup within the docker container) and therefore again it couldn't go back to main trains-agent. But ps -aux shows that the trains-agent is stuck running the first experiment, not the second...

5 years ago
0 Hi There, Maybe This Was Already Asked But I Don'T Remember: Would It Be Possible To Have The Clearml-Agent Switch Between Docker Mode And Virtualenv Mode At Runtime, Depending On The Experiment

Yea so I assume that training my models using docker will be slightly slower so I'd like to avoid it. For the rest using docker is convenient

2 years ago
0 Hi Guys, Any Plan To Integrate The

Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now

5 years ago
0 Hi Guys, Any Plan To Integrate The

Ho I wasn't aware of that new implementation, was it introduced silently? I don't remember reading it in the release notes! To answer your question: no, for gcp I used the old version, but for azure I will use this one, maybe send a PR if code is clean 👍

5 years ago
Show more results compactanswers