JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

4 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...

mlops

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...

aws mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Where Can I Find The Logs Of Trains-Agent By Default?

Hi, where can I find the logs of trains-agent by default?

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Hi, is it possible to pass environment variables to agents created by the AWS AutoScaler service?

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi There, I Recently Updated Clearml Server To 1.7.0, And Found The Following

⚠️ Hi there, I recently updated clearml server to 1.7.0, and found the following critical regression: When I reset an experiment, it is actually deleted 😵 ,...

clearml

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

Hey there, since which version, clearml stops connecting to the demo server by default?

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

5 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Would Like To Bring Awareness

Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)

clearml

2 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi All, Would It Be Possible To Make The Aws Autoscaler Log Each Scale In/Out Operation In The Console To Help Debugging/Understanding The Course Of Events?

Hi all, Would it be possible to make the aws autoscaler log each scale in/out operation in the console to help debugging/understanding the course of events?

aws mlops

4 years ago

0 Votes

25 Answers

2K Views

0 Votes 25 Answers 2K Views

Hi, I Have Another Problem

Hi, I have another problem 😅 in one of my agent, one experiment started without torch using GPU. In the logs of the experiment shared below, we can see that...

clearml

5 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Got Some Errors While Running Migration Script From Es5 To Es7:

Got some errors while running migration script from ES5 to ES7: 2020-08-11 15:21:50,130 Running on: Linux 2020-08-11 15:21:50,227 Docker allocated memory: 16...

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, I Would Like To Report A Bug With The Resizing Of The Columns In The Projects View: It Doesn’T Work As Expected. Please Look At The Behavior Of The Resizing On The Following Screen Recording

Hi there, I would like to report a bug with the resizing of the columns in the projects view: it doesn’t work as expected. Please look at the behavior of the...

clearml

4 years ago

0 Votes

26 Answers

2K Views

0 Votes 26 Answers 2K Views

Hi, I Would Like To Follow-Up In This

Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...

aws mlops

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

The “Manage Queue” Option In The Right Tab On A Queued Experiment Is Broken In V1.0 (It Does Nothing)

The “Manage queue” option in the right tab on a queued experiment is broken in v1.0 (it does nothing)

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

Hi, I have a question regarding the aws-autoscaler: am I understanding correctly that: max_idle_time_min=5 max_spin_up_time_min=10 polling_interval_time_min=...

mlops

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

Hi, in the Metric Snapshot section of the Overview tab of a project page, would it be possible to: Show running experiments Have the legend clickable, to hid...

clearml

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Could you please explain a bit more how trains adapt the torch version depending on the installed cuda version? Here is my setup: cuda 102 installed and corr...

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, There Is A "Bug" Introduced In The Latest Version Of Clearml-Server: When An Experiment Is In "Full Screen View", In The Console Tab, The Auto Refreshing Of The Console Makes The Console Disappearing For A Short Moment. When The Console Reappears, The

Hi, there is a "bug" introduced in the latest version of clearml-server: when an experiment is in "full screen view", in the console tab, the auto refreshing...

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Would It Possible To Add An Option To Make

Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)

clearml

5 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

Hi Guys, I had several times now the following errors poping in agents while executing a task: trains_agent: ERROR: Failed applying git diff: I attached the ...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

Another one: What is the difference between Task.connect() and Task.set_parameter?

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Not Very Important, But Small Suggestion For The Web Ui: Under The Queues Tab, In The Queues Wait Time Graph, Would It Be Possible To Switch From Seconds To Minutes? When Waiting For Aws Instances, Usually It Can Take Up To An Hour, So Having 3.3K Seconds

Not very important, but small suggestion for the web UI: under the QUEUES tab, in the queues wait time graph, would it be possible to switch from seconds to ...

aws

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

Hello there! I have a question regarding the Web UI, on the project page: I have the following use case: I need to add two custom columns, each reporting one...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

Hi folks, Is it possible to use an aws p3 instance (which as several GPUs) with one agent per GPU, all controlled through ClearML AWS AutoScheduler? So Clear...

aws mlops

4 years ago

Show more results

0 Hi There, I Have A Problem With Pyjwt: I Am Using

so most likely one hard requirement installs the version 2 of pyjwt while setting up the experiment

4 years ago

0 Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

I am trying to upload an artifact during the execution

4 years ago

0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

and then call task.connect_configuration probably

3 years ago

0 Hi, It Seems That The

But according the the example, this syntax should be supported right?

5 years ago

0 I'M Getting A Lot Of Errors When Running Cleanup Service

What is this cleanup service? where is it available?

3 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

I added the pass_hashed and restarted the server, still get the same problem

4 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

So the problem comes when I do
my_task.output_uri = " s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds

5 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hi AgitatedDove14 , sorry somehow this message got lost 😄
clearml version is the latest at the time, 1.7.1 Yes, I always see the "model uploaded completed" for such stuck tasks I am using python 3.8.10

3 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

This is what I get, when I am connected and when I am logged out (by clearing cache/cookies)

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

AgitatedDove14 According to the dependency order you shared, the original message of this thread isn't solved: the agent mentionned used output from nvcc (2) before checking the nvidia driver version (1)

4 years ago

0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Yes it would be very valuable to be able to tweak that param, currently it's quite annoying because it's set to 30 mins, so when a worker is killed by the autoscaler, I have to wait 30 mins before the autoscaler spins up a new machine because the autoscaler thinks there is already enough agents available, while in reality the agent is down

2 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

but it is set here, right?

4 years ago

0 Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

AgitatedDove14 Yes that might work, also the first one (with conda) might work as well, I will give it a try, thanks!

4 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

They indeed do auto-rotate when you limit the size of the logs

4 years ago

0 Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hi AgitatedDove14 , coming by after a few experiments this morning:
Indeed torch 1.3.1 does not support cuda, I tried with 1.7.0 and it worked, BUT trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101. But in the experiment log, the agent correctly reported the cuda version (111). I then replaced the torch==1.7.0 with the direct https link to the torch wheel for cuda 110, and that worked (I also tried specifyin...

4 years ago

0 Hi, I Would Like To Report Something Else Weird In The Clearml-Agent 1.5.1 Running In Docker Mode: In The Logs, When It Dumps Its Config, It Writes:

Ok yes, I get it, this info is also available at the very beginning of the logs, where the agent logs the full docker run command, this docker_cmd is a shorter version?

2 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

thanks for clarifying! Maybe this could be clarified in the agent logs of the experiments with something like the following?
agent.cuda_driver_version = ... agent.cuda_runtime_version = ...

4 years ago

0 Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

Now I'm curious, what did you end up doing ?

in my repo I maintain a bash script to setup a separate python env. then in my task I spawn a subprocess and I don't pass the env variables, so that the subprocess properly picks up the separate python env

2 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

The only thing that changed is the new auth.fixed_users.pass_hashed field, that I don’t have in my config file

4 years ago

0 Hi, How Does

Ping CostlyOstrich36 AgitatedDove14 SuccessfulKoala55 Just making sure this wasn't missed 🙂

2 years ago

0 Hi, Together With

Exactly

5 years ago

0 Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Thanks TimelyPenguin76 and AgitatedDove14 ! I would like to delete artifacts/models related to the old archived experiments, but they are stored on s3. Would that be possible?

5 years ago

0 Hi Guys, Any Plan To Integrate The

Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now

5 years ago

0 Hi, I Would Like To Bring Awareness

This is not the case, I downloaded it and I got a cuda error at runtime

2 years ago

0 Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

Hi AnxiousSeal95 , I hope you had nice holidays! Thanks for the update! I discovered h2o when looking for ways to deploy dashboards with apps like streamlit. Most likely I will use either streamlit deployed through clearml or h2o as standalone if ClearML won't support deploying apps (which is totally fine, no offense there 🙂 )

4 years ago

0 Hey There, I Would Like To Increase The

now how to adapt to do it from extra_vm_bash_script ?

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

That’s why I said “not really” 😄

3 years ago

0 Hi There

btw task._get_task_property('hyperparams') also gives me ValueError: Task has no hyperparams section defined

5 years ago

Show more results