Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 9 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hey there ๐Ÿ™‚ Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hey again ๐Ÿ˜ Is it possible to run multiple agents on the same machine? And with some in services mode?
4 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi there, any plan/benefit to support virtualenv= 20 ?
4 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
3 years ago
0 Votes
11 Answers
1K Views
0 Votes 11 Answers 1K Views
Hey, I moved my trains-server to another machine, zipping the /opt/trains/data folder as described in the docs https://allegro.ai/docs/deploying_trains/train...
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Quick question: How can I clone a task and change the cloned task type? I see no Task.set_type() function
4 years ago
0 Votes
19 Answers
1K Views
0 Votes 19 Answers 1K Views
Hi again, I am trying to make the aws autoscaler work with ec2 instances, but it fails to setup the agent in the machine: the logs of the user-data script sh...
3 years ago
0 Votes
13 Answers
1K Views
0 Votes 13 Answers 1K Views
Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...
4 years ago
0 Votes
28 Answers
1K Views
0 Votes 28 Answers 1K Views
Hi, I am trying to use omegaconf with task.connect_configuration and I get the following error: >>> OmegaConf.create(task.connect_configuration(config_dict))...
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Another one: What is the difference between Task.connect() and Task.set_parameter?
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/
4 years ago
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
4 years ago
0 Votes
20 Answers
1K Views
0 Votes 20 Answers 1K Views
Hello, I have an error while installing git dependencies of local package: So far I used task. update _requirements(“[.]“) with my local package referencing ...
3 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...
4 years ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...
4 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
Hi, I deleted some archived experiments in clearml server 1.0 and the popup in the dashboard showed “the following artifacts were not deleted”, with a list o...
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...
3 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...
3 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
2 years ago
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
Hi there, I think there is a bug with clearml sdk v0.17.5rc2: when running a task locally, the dashboard doesnt not shows the task as finished once the task ...
4 years ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
aws
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Congrats on the clearml-serving 0.9.0 release! I’ll try it for sure!
2 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi there, would it be possible to add some Neural Architecture Search example, as for the HyperParameter Optimizer examples?
4 years ago
0 Votes
23 Answers
1K Views
0 Votes 23 Answers 1K Views
Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...
4 years ago
0 Votes
17 Answers
1K Views
0 Votes 17 Answers 1K Views
Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I t...
4 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...
3 years ago
Show more results questions
0 Hi, I Am Currently Using

Yes, I switched to that, thanks!

2 years ago
0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

my docker-compose for the master node of the ES cluster is the following:
` version: "3.6"
services:

elasticsearch:
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g
bootstrap.memory_lock: "true"
cluster.name: clearml-es
cluster.initial_master_nodes: clearml-es-n1, clearml-es-n2, clearml-es-n3
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
clust...

3 years ago
0 Hi There, I Used

AgitatedDove14 No, should I?

3 years ago
0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

The weird thing is that the second experiment started immediatly, correctly in a docker container, but failed with User aborted: stopping task (3) at some point (while installing the packages). The error message is suprizing since I did not do anything. And then all following experiments are queued to services queue and stuck there.

4 years ago
0 Hi, I Have An Error With Clearml-Agent 1.5.1 When Importing Tensorflow 2.10

Actually was not related to clearml, the higher level error causing this one was (somewhere in the stack trace): RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd -> wrong numpy version

2 years ago
0 Hi, Did Anyone Experiment With Running On The Aws Autoscaler On Spots And Knows Whether There Is Configuration For Retry Policy When Spot Get Evacuated Mid-Job?

Hi there, yes I was able to make it work with some glue code:
Save your model, optimizer, scheduler every epoch Have a separate thread that periodically pulls the instance metadata and check if the instance is marked for stop, in this case, add a custom tag eg. TO_RESUME Have a services that periodically pulls failed experiments from the queue with the tag TO_RESUME, force marking them as stopped instead of failed and reschedule them with as extra-param the last checkpoint

3 years ago
0 Hi, Coming Back With The Venv Caching: With The Following Setting:

ok, so there is no way to cache it and detect when the ref changes?

4 years ago
0 Announcing Clearml 0.17.5 Features

We would be super happy to have the possibility of documenting experiments (new tab in experiments UI) with a markdown editor!

3 years ago
0 Hello, What Is The Default Limit For Global Context ?

Thanks! (Maybe could be added to the docs ?) ๐Ÿ™‚

4 years ago
4 years ago
0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Nevermind, i was able to make it work, but no idea how

3 years ago
0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

with 1.1.1 I get
User aborted: stopping task (3)

3 years ago
0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

mmmmh I just restarted the experiment and it seems to work now. I am not sure why that happened. From this SO it could be related to size of the repo. Might be a good idea to clone with --depth 1 in the agents?
Or more generally, try to catch this error and retry a few times?

4 years ago
0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Unfortunately this is difficult to reproduce... Neverthless it would be important to me to be robust against it, because if this error happens in a task in the middle of my pipeline, the whole process fails.

This binds to another wider topic I think: How to "skip" tasks if they already run (a mechanism similar to what [ https://luigi.readthedocs.io/en/stable/ ] offers). That would allow to restart the pipeline and skip tasks until the point where the task failed

4 years ago
0 Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Thanks TimelyPenguin76 and AgitatedDove14 ! I would like to delete artifacts/models related to the old archived experiments, but they are stored on s3. Would that be possible?

4 years ago
0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi AgitatedDove14 , sorry somehow this message got lost ๐Ÿ˜„
clearml version is the latest at the time, 1.7.1 Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10

2 years ago
0 I Guess One Experiment Is Running Backwards In Time

btw CostlyOstrich36 , I can see in Profile > Version: 1.1.1-135 โ€ข 1.1.1 โ€ข 2.14 . What these numbers correspond to?

3 years ago
0 Does Trains 0.16 Supports Pip >=20.2?

Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.

4 years ago
0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

Iโ€™m not too fond of many user configurations, itโ€™s confusing.

100% agree, nevertheless, how much is too many? Currently, there are only two settings in the user preferences category, so one more wouldnโ€™t hurt?

however, clearml is open source, nothing stops you from adding the code and sending a PR

Iโ€™d be super happy to contribute yes! Nevertheless, I am not sure where to start: clearml-server repo? clearml-web repo?

2 years ago
0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So the problem comes when I do
my_task.output_uri = " s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds

4 years ago
0 Hi There, Maybe This Was Already Asked But I Don'T Remember: Would It Be Possible To Have The Clearml-Agent Switch Between Docker Mode And Virtualenv Mode At Runtime, Depending On The Experiment

Yea so I assume that training my models using docker will be slightly slower so I'd like to avoid it. For the rest using docker is convenient

2 years ago
0 Hey, Would It Possible To Add An Option To Make

awesome ๐ŸŽ‰
Maybe then we can extend task.upload_artifact ?
def upload_artifact(..., wait_for_upload: bool = False): ... if wait_for_upload: self.flush(wait_for_uploads=True)

4 years ago
0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Bottom line is: trains-server uses elastichsearch image: http://docker.elastic.co/elasticsearch/elasticsearch:5.6.16 which does not have an unlimited license (only free license that expires after some time). From versions 6.3, elasticsearch provides an unlimited free license. Trains should use >=6.3, WDYT?

4 years ago
Show more results compactanswers