Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JitteryCoyote63
Moderator
214 Questions, 1021 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

979 × Eureka!
0 Votes
13 Answers
982 Views
0 Votes 13 Answers 982 Views
Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...
4 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, I think I found a small bug: Clone an experiment Enqueue it on a queue with no workers Delete the queue Try to Dequeue the experimentThe last operation w...
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...
4 years ago
0 Votes
3 Answers
988 Views
0 Votes 3 Answers 988 Views
Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others
3 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...
4 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?
4 years ago
0 Votes
30 Answers
1K Views
0 Votes 30 Answers 1K Views
3 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
2 years ago
0 Votes
1 Answers
933 Views
0 Votes 1 Answers 933 Views
Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...
4 years ago
0 Votes
1 Answers
993 Views
0 Votes 1 Answers 993 Views
Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...
3 years ago
0 Votes
11 Answers
975 Views
0 Votes 11 Answers 975 Views
Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...
4 years ago
0 Votes
26 Answers
1K Views
0 Votes 26 Answers 1K Views
Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...
aws
3 years ago
0 Votes
4 Answers
992 Views
0 Votes 4 Answers 992 Views
Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...
4 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...
3 years ago
0 Votes
2 Answers
635 Views
0 Votes 2 Answers 635 Views
Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...
8 months ago
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
Hi, similar to Task.set_offline(True), is there a way to simulate an execution in an agent? (for testing purposes)
2 years ago
0 Votes
2 Answers
646 Views
0 Votes 2 Answers 646 Views
Hi there, I have several experiments hanging/stuck in the middle or at the end of the training, with the last message logged being: train INFO: Engine run co...
7 months ago
0 Votes
2 Answers
921 Views
0 Votes 2 Answers 921 Views
Hey there ๐Ÿ™‚ Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...
3 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hi, how can I change the project.default_output_destination? I tried setting it to None but it is not updated
2 years ago
0 Votes
6 Answers
1K Views
0 Votes 6 Answers 1K Views
Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...
one year ago
0 Votes
25 Answers
985 Views
0 Votes 25 Answers 985 Views
Hi, I have another problem ๐Ÿ˜… in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...
4 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, I have a configuration file that I read and connect to my training tasks. I cannot use config = task.get_parameters_as_dict()["General"]["param"]["nested...
3 years ago
0 Votes
27 Answers
1K Views
0 Votes 27 Answers 1K Views
4 years ago
0 Votes
13 Answers
989 Views
0 Votes 13 Answers 989 Views
4 years ago
0 Votes
7 Answers
976 Views
0 Votes 7 Answers 976 Views
Hi, I am currently using CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS when starting my clearml-agent and I would like to switch to using a single auth t...
2 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?
2 years ago
0 Votes
9 Answers
1K Views
0 Votes 9 Answers 1K Views
Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...
2 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?
4 years ago
0 Votes
1 Answers
900 Views
0 Votes 1 Answers 900 Views
Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? ๐Ÿ˜„
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...
3 years ago
Show more results questions
0 Hi, I Would Like To Bring Awareness

and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working

one year ago
0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

no it doesn't! 3. They select any point that is an improvement over time

2 years ago
2 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I am not using hydra, I am reading the conf with:
config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))

2 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

But I am not sure it will connect the parameters properly, I will check now

2 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Doing it the other way around works:
` cfg = OmegaConf.create(read_yaml(conf_yaml_path))
config = task.connect(cfg)
type(config)

<class 'omegaconf.dictconfig.DictConfig'> `

2 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict() ?
Why not task.connect_configuration(read_yaml(conf_path)) simply?
I mean what is the benefit of returning ProxyDictPostWrite instead of a dict?

2 years ago
0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Same, it also returns a ProxyDictPostWrite , which is not supported by OmegaConf.create

2 years ago
0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago
0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

correct, you could also use

Task.create

that creates a Task but does not do any automagic.

Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."

4 years ago
0 Hello

Looking forward to seeing the clearml-deploy ๐Ÿคฉ you guys rock ๐Ÿš€

3 years ago
0 Hi, Similar To Task.Set_Offline(True), Is There A Way To Simulate An Execution In An Agent? (For Testing Purposes)

Because it lives behind a VPN and github workers donโ€™t have access to it

2 years ago
0 Hey There, Does Trains Support

No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?

4 years ago
0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomly
trains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |

4 years ago
0 Hey There, I Would Like To Increase The

by replacing the pid with $PID ?

3 years ago
0 Hey There, I Would Like To Increase The

it actually looks like I donโ€™t need such a high number of files opened at the same time

3 years ago
0 Hey There, I Would Like To Increase The

because at some point it introduces too much overhead I guess

3 years ago
0 Hey There, I Would Like To Increase The

now how to adapt to do it from extra_vm_bash_script ?

3 years ago
0 Hey There, I Would Like To Increase The

that works from within the ssh session

3 years ago
0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

3 years ago
0 Hey There, I Would Like To Increase The

I will try adding
sudo sh -c "echo '\n* soft nofile 65535\n* hard nofile 65535' >> /etc/security/limits.conf"to the extra_vm_bash_script , maybe thatโ€™s enough actually

3 years ago
0 Hey There, I Would Like To Increase The

So actually I donโ€™t need to play with this limit, I am OK with the default for now

3 years ago
0 Hi, What Happens Exactly When I Execute The Following Command:

Thanks AgitatedDove14 !
What would be the exact content of NVIDIA_VISIBLE_DEVICES if I run the following command?
trains-agent daemon --gpus 0,1 --queue default &

4 years ago
0 Hi, Although

Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)

3 years ago
0 Hi, How Can I Search An Old Experiment Based On Its Commit Hash?

I checked the commit date anch and went to all experiments, and scrolled until finding the experiment

one year ago
0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Nevermind, i was able to make it work, but no idea how

3 years ago
0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

with 1.1.1 I get
User aborted: stopping task (3)

3 years ago
Show more results compactanswers