JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Not Very Important, But Small Suggestion For The Web Ui: Under The Queues Tab, In The Queues Wait Time Graph, Would It Be Possible To Switch From Seconds To Minutes? When Waiting For Aws Instances, Usually It Can Take Up To An Hour, So Having 3.3K Seconds

Not very important, but small suggestion for the web UI: under the QUEUES tab, in the queues wait time graph, would it be possible to switch from seconds to ...

aws

2 years ago

0 Votes

5 Answers

986 Views

0 Votes 5 Answers 986 Views

Hello, I Have A Small Question Regarding Ui: Currently, In The Artifacts Section Of A Task, The

Hello, I have a small question regarding UI: Currently, in the artifacts section of a task, the FILE PATH displayed for artifacts stored in s3 are displayed ...

clearml

4 years ago

0 Votes

7 Answers

984 Views

0 Votes 7 Answers 984 Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi, There Is Small Bug In The Web Ui When Comparing Two Experiments Scalars: If The Two Tasks Have The Same Name, Then Clicking On The “Maximize Graph” Button On One Scalar Series To Get The Bigger View On That Scalar Series, Then The Color Of Both Series

Hi, there is small bug in the web UI when comparing two experiments scalars: If the two tasks have the same name, then clicking on the “Maximize graph” butto...

clearml

3 years ago

Show more results

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

AgitatedDove14 I have a machine with two gpus and one agent per GPU. I provide the same trains.conf to both agents, so they use the same directory for caching venvs. Can it be problematic?

3 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Is there any logic on the server side that could change the iteration number?

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

ok yea now I see it

3 years ago

0 We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

For new projects it works 🙂

3 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Probably something's wrong with the instance, which AMI you used? the default one?

The default one is not existing/accessible anymore, I replaced it with the one that was shown in the NVIDIA Deep Learning AMI markplace page https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1610377938050&sr=0-1&ref_=srh_res_product_title that is: ami-04c0416d6bd8e4b1f

3 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

(Btw the instance listed in the console has no name, it it normal?)

3 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

Ok, now I get ERROR: No matching distribution found for conda==4.9.2 (from -r /tmp/cached-reqscaw2zzji.txt (line 13))

3 years ago

Ok, deleting installed packages list worked for the first task

3 years ago

0 Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

Why would it solve the issue? max_spin_up_time_min should be the param defining how long to wait after starting an instance, not polling_interval_time_min , right?

3 years ago

btw I monkey patched ignite’s function global_step_from_engine to print the iteration and passed the modified function to the ClearMLLogger.attach_output_handler(…, global_step_transform=patched_global_step_from_engine(engine)) . It prints the correct iteration number when calling ClearMLLogger.OutputHandler.__ call__ .
` def call(self, engine: Engine, logger: ClearMLLogger, event_name: Union[str, Events]) -> None:

    if not isinstance(logger, ClearMLLogger):
  ...

3 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

on a p3.2xlarge instance

3 years ago

5,6 mins exactly

3 years ago

there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent won’t start because the userdata script fails

3 years ago

edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets

3 years ago

so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...

3 years ago

the instances takes so much time to start, like 5 mins

3 years ago

the deep learning AMI from nvidia (Ubuntu 18.04)

3 years ago

Here is the log from the instance

3 years ago

What do you mean by aws scalar?

3 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

The task with id a445e40b53c5417da1a6489aad616fee is not aborted and is still running

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

thanks for your help!

4 years ago

0 Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

Yea again I am trying to understand what I can do with what I have 😄 I would like to be able to export as an environment variable the runtime where the agent is installing, so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily

2 years ago

0 Is There An Option To Make Trains-Agent Create Experiment Virtualenvs With

Just found yea, very cool! Thanks!

4 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

as for disk space: I have 21Gb available (8Gb used), /opt/trains/data folder is about 600Mo

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomly
trains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |

4 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

It indeed has the old commit, so they match, no problem actually 🙂

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Yes, thanks! In my case, I was actually using TrainsSaver from pytorch-ignite with a local path, then I understood looking at the code that under the hood it actually changed the output_uri of the current task, thats why my previous_task.output_uri = " s3://my_bucket " had no effect (it was placed BEFORE the training)

4 years ago

0 Hi There

basically:
` from trains import Task

task = Task.init("test", "test", "controller")
task.upload_artifact("test-artifact", dict(foo="bar"))
cloned_task = Task.clone(task, name="test", parent=task.task_id)
cloned_task.data.script.entry_point = "test_task_b.py"
cloned_task._update_script(cloned_task.data.script)
cloned_task.set_parameters(**{"artifact_name": "test-artifact"})
Task.enqueue(cloned_task, queue_name="default") `

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Setting it after the training correctly updated the task and I was able to store artifacts remotely

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I have two controller tasks running in parallel in the trains-agent services queue

4 years ago

Show more results