JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, I Just Updated Clearml Server 1.0 Using

Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

How Can I Filter Out Archived Tasks With Task.Get_Tasks?

How can I filter out archived tasks with Task.get_tasks?

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Hi guys, is a Task updating its status to 'Complete' before finishing to upload its artifacts/metrics in the background?

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...

pytorch

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey There, Happy New Year To All Of You

Hey there, happy new year to all of you 🍾 I have several tasks that are stuck while training a model with pytorch/ignite, more precisely right after uploadi...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, How Can I Easily Start A Shell Script From Within An Experiment And Have Its Logs (Stdin/Err) Logged In Clearml?

Hi, how can I easily start a shell script from within an experiment and have its logs (stdin/err) logged in clearml?

clearml

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, Is There A Way To Update The Setup Shell Script Via The Sdk?

Hi, is there a way to update the setup shell script via the SDK?

clearml

2 years ago

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Hi, I have an agent that is running two experiments at the same time: one that was running for a long time (11h) and one that the agent picked up afterwards,...

mlops

5 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Often I Want To Compare Scalars Of Two Experiments With The Same Name But With Different Tags. In The Scalars Comparison Tab, I Cannot See Which Experiment Is Which Because I Don’T See The Tags. Usually, I Rename The Experiments So That I Can Identif

Hey, often I want to compare scalars of two experiments with the same name but with different tags. In the SCALARS comparison tab, I cannot see which experim...

clearml

3 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

Hi, where can I find the server parameter to control when the server is unregistering an agent after not receiving updates? Currently it's quite long (30mins...

mlops

2 years ago

0 Votes

19 Answers

2K Views

0 Votes 19 Answers 2K Views

Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

Hi, with clearml-agent 1.5.1, I tried to run an experiment within a docker with image python3:8 and it failed executing the task while trying to call python3...

clearml

2 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Hello, I tried the clearml-session CLI to start a jupyter instance on an agent, but an error with the password, here is the full CLI log: $ clearml-session -...

aws remote-ssh

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Quick Question: How Can I Clone A Task And Change The Cloned Task Type? I See No Task.Set_Type() Function

Quick question: How can I clone a task and change the cloned task type? I see no Task.set_type() function

clearml

5 years ago

0 Votes

15 Answers

2K Views

0 Votes 15 Answers 2K Views

Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

Hi, one more question: When creating a task with Task.init(), we can specify the task_type . Now when using Task.clone(), we cannot specify the task_type (is...

clearml

5 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

Hi, I updated to clearml-server 1.4.0 and I am uncomfortable with the new Table/Detail view, is there a way to disable it and use the previous one (on click ...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Quick Question: Why Does Clearml-Server 1.15.0 Api-Server Python Package Require Es 8.12.0 But The Docker-Compose References Es 7.17.18?

Quick question: Why does clearml-server 1.15.0 api-server python package require ES 8.12.0 but the docker-compose references ES 7.17.18?

clearml

one year ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I t...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Quick Question: Does Task.Connect_Configuration Support Omegaconf Dictconfig Objects? Ie. Can I Do:

Hi quick question: does Task.connect_configuration support OmegaConf DictConfig objects? ie. Can I do: config = train_task.connect_configuration(OmegaConf.lo...

clearml

3 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I Am Trying To Update The Aws_Autoscaler To The Latest Version On The Master Branch. I Simply Changed The Commit Id In The Experiment And Run It, This Gave Me The Following Error:

Hi, I am trying to update the aws_autoscaler to the latest version on the master branch. I simply changed the commit id in the experiment and run it, this ga...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey Guys, Quick Question: Is There A Tool Function To Know If A Task Id Is Valid? Not Verifying That The Task Itself Exists, Just That The Task Id Is The Correct Format

Hey guys, quick question: is there a tool function to know if a task id is valid? Not verifying that the task itself exists, just that the task id is the cor...

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Hi, another bug to report with the aws_auto_scaler using 1.1.2: Traceback (most recent call last): File "aws_autoscaler.py", line 297, in main() File "aws_au...

mlops

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hey There

Hey there 🙂 Would in the WebUI, on an experiment CONFIGURATION tab, for a specific parameter, would it be possible not show its value as a single string whe...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey, I Would Like My Experiment To Call At Some Point A Cli Program Installed As A Dependency Of The Experiment. Here Is What I Do:

Hey, I would like my experiment to call at some point a CLI program installed as a dependency of the experiment. Here is what I do: myTask = Task.init(...) i...

clearml

5 years ago

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hi, similar to Task.set_offline(True), is there a way to simulate an execution in an agent? (for testing purposes)

clearml

3 years ago

Show more results

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?

4 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip

Exactly, that’s my problem: I want to remove it to make sure it is reinstalled (because the version can change)

I think that since the agent installs everything from scratch it should work for you. Wdyt?

With env caching enabled, it won’t reinstall this private dependency, right?

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

I have two controller tasks running in parallel in the trains-agent services queue

5 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

If I don’t start clearml-session , I can easily connect to the agent, so clearml-session is doing something that messes up the ssh config and prevent me from ssh into the agent afterwards

3 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Should I try to disable dynamic mapping before doing the reindex operation?

4 years ago

0 Hi There,

Adding back clearml logging with matplotlib.use('agg') , uses more ram but not that suspicious

2 years ago

0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

I must say I am confused

4 years ago

0 Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

So I created a symlink in /opt/train/data -> /data

4 years ago

0 Hello, I Have A Small Question Regarding Ui: Currently, In The Artifacts Section Of A Task, The

Or even better: would it be possible to have a support for HTML files as artifacts?

5 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4 and the ImportError disappeared 🙂

5 years ago

0 Hi Clearml Team Members! Is There Any Progress Made On The Clearml-Serving Repo? I’D Love To Start Using It But I Lack A Straightforward Get Started Example. My Use Case Is The Following:

Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how t...

3 years ago

Ok, now I would like to copy from one machine to another via scp, so I copied the whole /opt/trains/data folder, but I got the following errors:

4 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

Could be, but not sure -> from 0.16.2 to 0.16.3

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

now I can do nvcc --version and I get
Cuda compilation tools, release 10.1, V10.1.243

4 years ago

0 Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

Thanks! I will investigate further, I am thinking that the AWS instance might have been stuck for an unknown reason (becoming unhealthy)

3 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Thanks a lot, I will play with that!

4 years ago

0 Hey There, Does Trains Support

No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?

5 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , How should we proceed to fix this bug? Should I open an issue in github? Should I try to make a minimal reproducible example? It’s blocking me atm

3 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

-> seems to run properly now

5 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

PS: in the new env, I’v set num_replicas: 0, so I’m only talking about primary shards…

4 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

ha nice thanks 🙏

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Yea, the config is not appearing in the webUI anymore with this method 😞

3 years ago

0 Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

No I agree, it’s probably not worth it

3 years ago

0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

clearml 1.1.1 works

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

awesome thanks!

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

ClearML has a task.set_initial_iteration , I used it as such:
checkpoint = torch.load(checkpoint_fp, map_location="cuda:0") Checkpoint.load_objects(to_load=self.to_save, checkpoint=checkpoint) task.set_initial_iteration(engine.state.iteration)But still the same issue, I am not sure whether I use it correctly and if it’s a bug or not, AgitatedDove14 ? (I am using clearml 1.0.4rc1, clearml-agent 1.0.0)

4 years ago

0 Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Yes that’s correct - the weird thing is that the error shows the right detected region

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

and the agent says agent.cudnn_version = 0

4 years ago

0 Hi Guys, Following Up On This

And I do that each time I want to create a subtask. This way I am sure to retrieve the task if it already exists

5 years ago

0 Hi Guys, Following Up On This

it should return the task regardless if it is complete or not

5 years ago

Show more results