JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

13 Answers

982 Views

0 Votes 13 Answers 982 Views

Hello, In The Following Context:

Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, I Think I Found A Small Bug:

Hi, I think I found a small bug: Clone an experiment Enqueue it on a queue with no workers Delete the queue Try to Dequeue the experimentThe last operation w...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Looks Like Trains-Agent 0.16

Looks like trains-agent 0.16 doesn't support --install-globally documented parameter -> Only available for trains-agent build command. Would it be possible t...

clearml

4 years ago

0 Votes

3 Answers

988 Views

0 Votes 3 Answers 988 Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Got Some Errors While Running Migration Script From Es5 To Es7:

Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...

clearml

4 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?

clearml

4 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Hi, I face a strange behavior from the clearml-agent: it’s running in services mode, not in docker mode, cpu only. I want to execute two tasks on this servic...

mlops

3 years ago

0 Votes

12 Answers

1K Views

0 Votes 12 Answers 1K Views

Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hey there, since a bit I often find experiments being stuck while training a model. It seems to happen randomly and I could not find a reproducible scenario ...

mlops

2 years ago

0 Votes

1 Answers

933 Views

0 Votes 1 Answers 933 Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

4 years ago

0 Votes

1 Answers

993 Views

0 Votes 1 Answers 993 Views

Hi, I Encounter The Following Bug With Clearml 0.17.5Rc2: When I Start A Task Locally And That Task Raises Cuda Out Of Memory, The Command Returns But The Process Is Not Killed, And Therefore The Gpu Ram Is Not Freed

Hi, I encounter the following bug with clearml 0.17.5rc2: When I start a task locally and that task raises cuda out of memory, the command returns but the pr...

clearml

3 years ago

0 Votes

11 Answers

975 Views

0 Votes 11 Answers 975 Views

Hi Guys, Following Up On This

Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...

clearml

4 years ago

0 Votes

26 Answers

1K Views

0 Votes 26 Answers 1K Views

Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...

aws

3 years ago

0 Votes

4 Answers

992 Views

0 Votes 4 Answers 992 Views

Hi, What Happens Exactly When I Execute The Following Command:

Hi, what happens exactly when I execute the following command: trains-agent daemon --gpus 0 --queue default &In my code, how to know which GPU to choose insi...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...

aws mlops

3 years ago

0 Votes

2 Answers

635 Views

0 Votes 2 Answers 635 Views

Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...

clearml

8 months ago

0 Votes

27 Answers

1K Views

0 Votes 27 Answers 1K Views

Hi, similar to Task.set_offline(True), is there a way to simulate an execution in an agent? (for testing purposes)

clearml

2 years ago

0 Votes

2 Answers

646 Views

0 Votes 2 Answers 646 Views

Hi There, I Have Several Experiments Hanging/Stuck In The Middle Or At The End Of The Training, With The Last Message Logged Being:

Hi there, I have several experiments hanging/stuck in the middle or at the end of the training, with the last message logged being: train INFO: Engine run co...

clearml

7 months ago

0 Votes

2 Answers

921 Views

0 Votes 2 Answers 921 Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

Hi, how can I change the project.default_output_destination? I tried setting it to None but it is not updated

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, How Does

Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...

mlops

one year ago

0 Votes

25 Answers

985 Views

0 Votes 25 Answers 985 Views

Hi, I Have Another Problem

Hi, I have another problem 😅 in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, I Have A Configuration File That I Read And Connect To My Training Tasks. I Cannot Use

Hi, I have a configuration file that I read and connect to my training tasks. I cannot use config = task.get_parameters_as_dict()["General"]["param"]["nested...

clearml

3 years ago

0 Votes

27 Answers

1K Views

0 Votes 27 Answers 1K Views

Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Hi, I have an agent that is running two experiments at the same time: one that was running for a long time (11h) and one that the agent picked up afterwards,...

mlops

4 years ago

0 Votes

13 Answers

989 Views

0 Votes 13 Answers 989 Views

Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

trains-elastic | {"type": "server", "timestamp": "2020-12-07T15:19:11,101Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "c...

clearml

4 years ago

0 Votes

7 Answers

976 Views

0 Votes 7 Answers 976 Views

Hi, I Am Currently Using

Hi, I am currently using CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS when starting my clearml-agent and I would like to switch to using a single auth t...

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, How Can I Easily Start A Shell Script From Within An Experiment And Have Its Logs (Stdin/Err) Logged In Clearml?

Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?

clearml

2 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

Hi, I want to upgrade clearml server from 1.1 to 1.2 (self hosted). I have the following setup: /dev/nvme0n1p1 30G 21G 8.9G 70% / <- This is where /opt/clear...

clearml

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hey, What Is The Exact Difference Between

Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?

clearml

4 years ago

0 Votes

1 Answers

900 Views

0 Votes 1 Answers 900 Views

Is It Possible To Shutdown The Clearml Server, Upgrade To V1, Restart It While Experiments Are Running? Or Is It Dancing With The Devil?

Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? 😄

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

3 years ago

Show more results

0 Hi, I Would Like To Bring Awareness

and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working

one year ago

0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

no it doesn't! 3. They select any point that is an improvement over time

2 years ago

0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

Thanks!3. I don't know, I never used Highcharts 🙂

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I am not using hydra, I am reading the conf with:
config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

But I am not sure it will connect the parameters properly, I will check now

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Doing it the other way around works:
` cfg = OmegaConf.create(read_yaml(conf_yaml_path))
config = task.connect(cfg)
type(config)

<class 'omegaconf.dictconfig.DictConfig'> `

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict() ?
Why not task.connect_configuration(read_yaml(conf_path)) simply?
I mean what is the benefit of returning ProxyDictPostWrite instead of a dict?

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Same, it also returns a ProxyDictPostWrite , which is not supported by OmegaConf.create

2 years ago

0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

correct, you could also use

Task.create

that creates a Task but does not do any automagic.

Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."

4 years ago

0 Hello

Looking forward to seeing the clearml-deploy 🤩 you guys rock 🚀

3 years ago

Because it lives behind a VPN and github workers don’t have access to it

2 years ago

0 Hey There, Does Trains Support

No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomly
trains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |

4 years ago

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Why is it required in the case where boto3 can figure them out itself within the ec2 instance?

3 years ago

0 Hey There, I Would Like To Increase The

by replacing the pid with $PID ?

3 years ago

0 Hey There, I Would Like To Increase The

it actually looks like I don’t need such a high number of files opened at the same time

3 years ago

0 Hey There, I Would Like To Increase The

because at some point it introduces too much overhead I guess

3 years ago

0 Hey There, I Would Like To Increase The

now how to adapt to do it from extra_vm_bash_script ?

3 years ago

0 Hey There, I Would Like To Increase The

that works from within the ssh session

3 years ago

0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

3 years ago

0 Hey There, I Would Like To Increase The

I will try adding
sudo sh -c "echo '\n* soft nofile 65535\n* hard nofile 65535' >> /etc/security/limits.conf"to the extra_vm_bash_script , maybe that’s enough actually

3 years ago

0 Hey There, I Would Like To Increase The

So actually I don’t need to play with this limit, I am OK with the default for now

3 years ago

0 Hi, What Happens Exactly When I Execute The Following Command:

Thanks AgitatedDove14 !
What would be the exact content of NVIDIA_VISIBLE_DEVICES if I run the following command?
trains-agent daemon --gpus 0,1 --queue default &

4 years ago

0 Hi, Although

Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)

3 years ago

I am doing so

3 years ago

0 Hi, How Can I Search An Old Experiment Based On Its Commit Hash?

I checked the commit date anch and went to all experiments, and scrolled until finding the experiment

one year ago

0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Nevermind, i was able to make it work, but no idea how

3 years ago