JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

Hi, from within an experiment, how can I intercept the signal that the experiment was aborted and execute a cleanup function? I tried to intercept SIGINT and...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, It Seems That The

Hi, It seems that the package_manager.pip_version has been removed from the https://allegro.ai/docs/references/trains_ref/#agent , although still being shown...

clearml

5 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

Hi, how can I change the project.default_output_destination? I tried setting it to None but it is not updated

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Is It Possible To Shutdown The Clearml Server, Upgrade To V1, Restart It While Experiments Are Running? Or Is It Dancing With The Devil?

Is it possible to shutdown the clearml server, upgrade to v1, restart it while experiments are running? Or is it dancing with the devil? 😄

clearml

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

Hi, is it possible to pass temporary IAM role to the web app could access?

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, There Is Small Bug In The Web Ui When Comparing Two Experiments Scalars: If The Two Tasks Have The Same Name, Then Clicking On The “Maximize Graph” Button On One Scalar Series To Get The Bigger View On That Scalar Series, Then The Color Of Both Series

Hi, there is small bug in the web UI when comparing two experiments scalars: If the two tasks have the same name, then clicking on the “Maximize graph” butto...

clearml

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hi There, I Have A Bit Of A Problem With Aws Secrets: I Pass Keys As Env Var To Clearml-Agents To Retrieve Data From A Bucket In Us-East-1 But I Use A Bucket To Store Task Artifacts In A Bucket In Eu-Central-1. So When I Pass Aws Keys As Env Vars, The Tas

Hi there, I have a bit of a problem with AWS secrets: I pass keys as env var to clearml-agents to retrieve data from a bucket in us-east-1 but I use a bucket...

mlops

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Am Currently Using

Hi, I am currently using CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS when starting my clearml-agent and I would like to switch to using a single auth t...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Are The Env Variables Passed To Trains-Agent Available In Experiments Run By This Trains-Agent?

Are the env variables passed to trains-agent available in experiments run by this trains-agent?

clearml

5 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Hi, kudos for the 0.15 guys! I am having an issue related to git auth: I have an issue with trains-agent (0.15): it does not use git creds while trying to cl...

mlops

5 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...

mlops

5 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Hey, I moved my trains-server to another machine, zipping the /opt/trains/data folder as described in the docs https://allegro.ai/docs/deploying_trains/train...

mlops

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Hi, is it possible to disable some of the system metrics monitored? and also downsample the rate of logging?

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hi Guys, Coming This Time To Share An Idea Of A Killer Feature For Clearml

Hi guys, coming this time to share an idea of a killer feature for ClearML 🚀 I am pretty sure you guys already heard of https://www.streamlit.io/ , which is...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey Again

Hey again 😁 Is it possible to run multiple agents on the same machine? And with some in services mode?

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

How Can I Do The Following? (Basically, Filtering By Task Type)

How can I do the following? (basically, filtering by task type) Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type="trainin...

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Would Like To Use Pytorch3D==0.5.0 With Torch==1.9.1 On Cuda Version 110, Locally It Works, But The Clearml Agent Fails Setting Up The Environment With The Following Error:

Hi, I would like to use pytorch3d==0.5.0 with torch==1.9.1 on cuda version 110, locally it works, but the clearml agent fails setting up the environment with...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Is There An Option To Make Trains-Agent Create Experiment Virtualenvs With

Is there an option to make trains-agent create experiment virtualenvs with --system-site-packages parameter?

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Hello, is it possible for the clearml-agent in docker mode to not pull a specific docker image, but to build one from the experiment repository using the Doc...

clearml

3 years ago

0 Votes

20 Answers

2K Views

0 Votes 20 Answers 2K Views

Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Is it possible to run an agent, listen to the services queue without using docker?

clearml

5 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi, I Just Updated Clearml Server 1.0 Using

Hi, I just updated clearml server 1.0 using docker-compose down & docker-compose pull & docker-compose up -d , it worked ant it looks amazing! I found two pr...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

How Can I Filter Out Archived Tasks With Task.Get_Tasks?

How can I filter out archived tasks with Task.get_tasks?

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Hi, I just updated clearml-server to 1.1.0 and got the following error when starting it with docker-compose: clearml-apiserver | [2021-08-02 13:37:09,852] [8...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Hi guys, since I am done with implementing the AWS autoscaler, I would like to share some pain points that I encountered in the process with the hope that th...

aws

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Hello, ~3 months ago I created a trains-server in a machine with 30gb of disk space. Today I wasn't able to connect to trains-server, so I checked the server...

clearml

4 years ago

Show more results

0 Hi There

Thanks for your inputs, I will try that! For completion, here is how I retrieve the parameters:
` from trains import Task

task = Task.init("test", "test")
parent_task = Task.get_task(task.parent)
task.get_logger().report_text(task.get_parameters())
artifact_name = task.get_parameter("General/artifact_name")
artifact = parent_task.artifacts[artifact_name].get() `

5 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

https://github.com/allegroai/clearml-agent.git@f019905720529acbd316bd39b67c5ab0c02fcd55 to be exact

4 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

the first problem I had, that didn’t gave useful infos, was that docker was not installed in the agent machine x)

4 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

the instances takes so much time to start, like 5 mins

4 years ago

0 Hi, I Would Like To Follow-Up In This

meaning the RestAPI returns nothing, is that correct

Yes exactly, this is the response from the api server when I try to scroll down on the console to get more logs

3 years ago

0 Hi Everyone, Now I Am Evaluating Clearml. I Have A Question About How To Handle Datasets. Does Clearml Provide Any Function To Manage Datasets? Or Do We Need To Manage Them By Ourselves? In Our Usecase, We Update Datasets Little By Little Over Days Or W

This is no coincidence - Any data versioning tool you will find are somehow close to how git works (dvc, etc.) since they aim to solve a similar problem. In the end, datasets are just files.
Where clearml-data stands out imo is the straightfoward CLI combined with the Pythonic API that allows you to register/retrieve datasets very easily

4 years ago

0 Hi, Another Bug To Report With The Aws_Auto_Scaler Using 1.1.2:

Nevermind, i was able to make it work, but no idea how

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

thanks for your help!

5 years ago

0 Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi TimelyPenguin76 , I guess it tries to spin them down a second time, hence the double print

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

That was also my feeling! But I though that spawning the trains-agent from a conda env would isolate me from cuda drivers on the system

5 years ago

0 Hi, I Am Considering Making Automated Backups Of My Clearml-Server Using Amazon Ebs Snapshots. Should I Be Concerned With The Same Problem Described Here >

I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?

4 years ago

0 Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

--- /data ---------- 48.4 GiB [##########] /elastic_7 1.8 GiB [ ] /shared 879.1 MiB [ ] /fileserver . 163.5 MiB [ ] /clearml_cache . 38.6 MiB [ ] /mongo 8.0 KiB [ ] /redis

3 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

BTW, is there any specific reason for not upgrading to clearml?

I just didn't have time so far 🙂

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

trains-agent-1: runs an experiment for a long time (>12h). Picks a new experiment on top of the long one running trains-agent-2: runs only one experiment at a time, normal trains-agent-3: runs only one experiment at a time, normalIn total: 4 experiments running for 3 agents

5 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

awesome! Unfortunately, calling artifact["foo"].get() gave me:
Could not retrieve a local copy of artifact foo, failed downloading file:///checkpoints/test_task/test_2.fgjeo3b9f5b44ca193a68011c62841bf/artifacts/foo/foo.json
It tries to get it from the local storage, but the json is stored in s3 (it does exists) and I did create both tasks specifying the correct output_uri (to s3)

5 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

from the ClearML UI

4 years ago

0 Hey Guys, Quick Question: Is There A Tool Function To Know If A Task Id Is Valid? Not Verifying That The Task Itself Exists, Just That The Task Id Is The Correct Format

Thanks SuccessfulKoala55 😁

5 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

and in the logs:
`
agent.worker_name = worker1
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /...

4 years ago

0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

I’m not too fond of many user configurations, it’s confusing.

100% agree, nevertheless, how much is too many? Currently, there are only two settings in the user preferences category, so one more wouldn’t hurt?

however, clearml is open source, nothing stops you from adding the code and sending a PR

I’d be super happy to contribute yes! Nevertheless, I am not sure where to start: clearml-server repo? clearml-web repo?

3 years ago

0 Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

In the comparison the problem will be the same, right? If I choose last/min/max values, it won’t tell me the corresponding values for others metrics. I could switch to graphs, group by metric and look manually for the corresponding values, but that becomes quickly cumbersome as the number of experiments compared grow

3 years ago

0 Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

super, thanks SuccessfulKoala55 !

4 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

I am looking for a way to gracefully stop the task (clean up artifacts, shutdown backend service) on the agent

4 years ago

0 Hey, What Is The Exact Difference Between

I tested by installing flask in the default env -> which was installed in the ~/.local/lib/python3.6/site-packages folder. Then I created a venv with flag --system-site-packages . I activated the venv and flask was indeed available

5 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

Could you please share the stacktrace?

4 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

I was able to fix by applying for a license and registering it

5 years ago

Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...

4 years ago

0 Looks Like Trains-Agent 0.16

Thanks, I will create an issue. I am fine with both ways :)

5 years ago

0 Hi, Together With

Which commit corresponds to RC version? So far we tested with latest commit on master (9a7850b23d2b0e1f2098ab051de58ce806143fff)

5 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

I also would like to avoid any copy of these artifacts on s3 (to avoid double costs, since some folders might be big)

3 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

should I try to roll back to clearml-server 1.0.2? I am very anxious now…

4 years ago

Show more results