JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hello There, Is There A Parameter To Configure The Number Of Columns Rendered In The Preview Area Of The Csv Artifacts? (Some Of Them Are Truncated With “…”)

Hello there, is there a parameter to configure the number of columns rendered in the preview area of the CSV artifacts? (some of them are truncated with “…”)

clearml

4 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Hi there, I am running a clearml-agent in services mode (with docker) on a machine with two disks: one with the OS (8Go, 91% space used) and one for the data...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I Am Trying To Update The Aws_Autoscaler To The Latest Version On The Master Branch. I Simply Changed The Commit Id In The Experiment And Run It, This Gave Me The Following Error:

Hi, I am trying to update the aws_autoscaler to the latest version on the master branch. I simply changed the commit id in the experiment and run it, this ga...

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

Hi there! Is there an easy way to retrieve the site-package directory that was created by an agent from inside a task? Eg. task = Task.init(...) task.add_req...

mlops

2 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi There, Any Plan/Benefit To Support Virtualenv= 20 ?

Hi there, any plan/benefit to support virtualenv= 20 ?

clearml

5 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Have A Question Regarding The Aws_Autoscaler: It Usually Takes ~Hours To Get A Gpu Instance Nowadays. I Was Thinking, It Would Be Much More Interesting To Stop The Instances (Clearml-Agents) Instead Of Terminating Them Once They Are Inactive, So Tha

Hi, I have a question regarding the aws_autoscaler: It usually takes ~hours to get a GPU instance nowadays. I was thinking, it would be much more interesting...

mlops

3 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

Hi, I would like to create backups of my trains-server periodically. I was thinking about creating a service task under the devops project. The backup task w...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi There, I Have Several Experiments Hanging/Stuck In The Middle Or At The End Of The Training, With The Last Message Logged Being:

Hi there, I have several experiments hanging/stuck in the middle or at the end of the training, with the last message logged being: train INFO: Engine run co...

clearml

one year ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Hello, I would like to use spot instances together with the AWS autoscaler to train models with pytorch/ignite and I am wondering how to support interruption...

mlops

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Hi, I am getting the following errors in the experiments I am currently running: 2021-06-25 17:11:47,911 - clearml.Metrics - ERROR - Action failed <504/0: ev...

clearml

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

First Link In Hyperparameter Optimization Page Is Broken >

First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi There, I Am Trying To Start An Agent In Services Mode With Trains-Server Being On Localhost (But Not Started Together With The Docker-Compose!). My Trains.Conf Is The Following:

Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...

mlops

5 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi There, I Have A Problem With Pyjwt: I Am Using

Hi there, I have a problem with PyJWT: I am using trains==0.16.4 and trains-agent==0.16.3 in my agents. I installed PyJWT==1.7.1 in the agent (through extra_...

mlops

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi Guys, Last Night One Of Our Agents (0.16.1) Was Disconnected From Our Trains-Server While Executing An Experiment. I Saw That Because The Experiment It Was Running Had The Status Aborted And I Could Not See The Agent In The List Of Available Workers. H

Hi guys, Last night one of our agents (0.16.1) was disconnected from our trains-server while executing an experiment. I saw that because the experiment it wa...

mlops

5 years ago

0 Votes

15 Answers

2K Views

0 Votes 15 Answers 2K Views

Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

Hi, how can I get the logs from the pytorch ignite early stopping handler to be logged in clearml?

pytorch

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hey, I Have A Problem With The Following Task:

Hey, I have a problem with the following task: def main(args): config = yaml.load(open(args.config)) if __name__ == '__main__': parser = argparse.ArgumentPar...

clearml

5 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hey There, I Would Like To Increase The

Hey there, I would like to increase the ulimit for the number of files opened at the same time in a ec2 instance. According to this https://stackoverflow.com...

clearml

4 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

Hi, I updated to clearml-server 1.4.0 and I am uncomfortable with the new Table/Detail view, is there a way to disable it and use the previous one (on click ...

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_vers...

mlops

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...

clearml

one year ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Does Trains 0.16 Supports Pip >=20.2?

Does trains 0.16 supports pip >=20.2?

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, I Would Like To Report Another Bug Introduced With Clearml-Server 1.2.0: In The Comparison Page Of Two Experiments, On The Scalar Tab, With The Graph Layout, When Clicking On The Eye On One Scalar Group To Hide The Related Graphs, The Later Do Disappe

Hi, I would like to report another bug introduced with clearml-server 1.2.0: In the comparison page of two experiments, on the scalar tab, with the graph lay...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

Hi, in the aws autoscaler, is it possible to specify multiple regions (availability_zone)? I currently use eu-west-1a, and would like to start using eu-west-...

aws

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

4 years ago

0 Votes

30 Answers

3K Views

0 Votes 30 Answers 3K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

Hi there, it seems like there is a bug with the visualization of debug samples on the UI (server v1.2.0, self-hosted): when clicking on a debug sample then o...

clearml

3 years ago

Show more results

0 I Guess One Experiment Is Running Backwards In Time

Just caught another star 😄

3 years ago

0 Hey There, I Would Like To Increase The

because at some point it introduces too much overhead I guess

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf

5 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomly
trains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

Hi SuccessfulKoala55 , yes indeed

4 years ago

0 Hi, In The Clearml-Server Web-Ui, Under Debug Sample, Would It Be Possible To Improve The Logic For Fetching The Images? If I Have Say 200 Iteration, It Will The Last By Default. If I Want To See Iteration 50, I Will Have To Manually Click On The Arrow Un

Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated

2 years ago

0 Hi, What Happens Exactly When I Execute The Following Command:

Thanks AgitatedDove14 !
What would be the exact content of NVIDIA_VISIBLE_DEVICES if I run the following command?
trains-agent daemon --gpus 0,1 --queue default &

5 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

yes, will do now!

5 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3

5 years ago

0 Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

They are, but this doesn’t work - I guess it’s because temp IAM accesses have an extra token, that should be passed as well, but there is no such option on the web UI, right?

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

from 10 to 11

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

` resource_configurations {
A100 {
instance_type = "p3.2xlarge"
is_spot = false
availability_zone = "us-east-1b"
ami_id = "ami-04c0416d6bd8e4b1f"
ebs_device_name = "/dev/xvda"
ebs_volume_size = 100
ebs_volume_type = "gp3"
}
}

queues {
aws_a100 = [["A100", 15]]
}

extra_trains_conf = """
agent.package_manager.system_site_packages = true
agent.package_manager.pip_version = "==20.2.3"
"""

extra_vm_bash_script = """

sudo apt-get install -y libsm6 libxext6 libx...

4 years ago

0 Are The Various Task Types Available In 0.15? I Am Getting

awesome, thank you 👍

5 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

Sorry, its actually
task.update_requirements(["."])

4 years ago

0 Hi There, I Used

AgitatedDove14 So I’ll just replace task = clearml.Task.get_task(clearml.config.get_remote_task_id()) with Task.init() and wait for your fix 🙂

3 years ago

0 Hi, It Seems That The

Ok so it seems that the single quote is the reason, using double quotes works

5 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Ho I see, I think we are now touching a very important point:
I thought that torch wheels already included cuda/cudnn libraries, so you don't need to care about the system cuda/cudnn version because eventually only the cuda/cudnn libraries extracted from the torch wheels were used. Is this correct? If not, then does that mean that one should use conda to install the correct cuda/cudnn cudatoolkit?

5 years ago

and just run the same code I run production

3 years ago

0 Hi, I Recently Updated My Clearml To 1.1.2 And A Code That Was Working Before Now Behaves Completely Differently: I Am Using The Following To Log Debug Samples:

Sorry, I was actually able to fix it (using 1.1.3) not sure what was the problem 😄

4 years ago

0 Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Awesome, thanks WackyRabbit7 , AgitatedDove14 !

5 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

Hi SuccessfulKoala55 , How can I now if I log in in this free access mode? I assume it is since in the login page I only see login field, not password field

4 years ago

0 Hi,

Awesome, huge thanks to the team!

4 years ago

0 Hey There, I Moved The Clearml S3 Bucket Where I Stored All My Clearml Data From One S3 Bucket To Another And Now I Realized That All The Models/Experiments Logged In The Clearml-Server Still Refer To The Old S3 Bucket. Is There A Way To Update All The Re

Thanks a lot for the solution SuccessfulKoala55 ! I’ll try that if the solution “delete old bucket, wait for its name to be available, recreate it with the other aws account, transfer the data back” fails

4 years ago

0 Hi, Are The Experiments Logs Stored In S3 Or In The Trains-Server? (When Using S3 As Artifact Storage)

yes 🙂

4 years ago

0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

CostlyOstrich36 , this also happens with clearml-agent 1.1.1 on a aws instance…

3 years ago

I see 3 agents in the "Workers" tab

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

so most likely one hard requirement installs the version 2 of pyjwt while setting up the experiment

4 years ago