JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Answers 1023

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Yes AgitatedDove14 🙂

4 years ago

0 Hi, I Would Like To Report Another Bug Introduced With Clearml-Server 1.2.0: In The Comparison Page Of Two Experiments, On The Scalar Tab, With The Graph Layout, When Clicking On The Eye On One Scalar Group To Hide The Related Graphs, The Later Do Disappe

Interesting - I can reproduce easily

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

yes

5 years ago

mmmh good point actually, I didn’t think about it

3 years ago

0 I Guess One Experiment Is Running Backwards In Time

Just caught another star 😄

3 years ago

0 Hey There, I Would Like To Increase The

because at some point it introduces too much overhead I guess

4 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

Hi SuccessfulKoala55 , yes indeed

4 years ago

0 Hi, What Happens Exactly When I Execute The Following Command:

Thanks AgitatedDove14 !
What would be the exact content of NVIDIA_VISIBLE_DEVICES if I run the following command?
trains-agent daemon --gpus 0,1 --queue default &

5 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

yes, will do now!

5 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3

5 years ago

0 Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

They are, but this doesn’t work - I guess it’s because temp IAM accesses have an extra token, that should be passed as well, but there is no such option on the web UI, right?

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

from 10 to 11

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

` resource_configurations {
A100 {
instance_type = "p3.2xlarge"
is_spot = false
availability_zone = "us-east-1b"
ami_id = "ami-04c0416d6bd8e4b1f"
ebs_device_name = "/dev/xvda"
ebs_volume_size = 100
ebs_volume_type = "gp3"
}
}

queues {
aws_a100 = [["A100", 15]]
}

extra_trains_conf = """
agent.package_manager.system_site_packages = true
agent.package_manager.pip_version = "==20.2.3"
"""

extra_vm_bash_script = """

sudo apt-get install -y libsm6 libxext6 libx...

4 years ago

0 Are The Various Task Types Available In 0.15? I Am Getting

awesome, thank you 👍

5 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

Sorry, its actually
task.update_requirements(["."])

4 years ago

0 Hi, It Seems That The

Ok so it seems that the single quote is the reason, using double quotes works

5 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

edited the aws_auto_scaler.py, actually I think it’s just a typo, I just need to double the brackets

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Ho I see, I think we are now touching a very important point:
I thought that torch wheels already included cuda/cudnn libraries, so you don't need to care about the system cuda/cudnn version because eventually only the cuda/cudnn libraries extracted from the torch wheels were used. Is this correct? If not, then does that mean that one should use conda to install the correct cuda/cudnn cudatoolkit?

5 years ago

and just run the same code I run production

3 years ago

0 Hi, I Recently Updated My Clearml To 1.1.2 And A Code That Was Working Before Now Behaves Completely Differently: I Am Using The Following To Log Debug Samples:

Sorry, I was actually able to fix it (using 1.1.3) not sure what was the problem 😄

4 years ago

0 Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Awesome, thanks WackyRabbit7 , AgitatedDove14 !

5 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

Hi SuccessfulKoala55 , How can I now if I log in in this free access mode? I assume it is since in the login page I only see login field, not password field

4 years ago

0 Hi,

Awesome, huge thanks to the team!

4 years ago

0 Hey There, I Moved The Clearml S3 Bucket Where I Stored All My Clearml Data From One S3 Bucket To Another And Now I Realized That All The Models/Experiments Logged In The Clearml-Server Still Refer To The Old S3 Bucket. Is There A Way To Update All The Re

Thanks a lot for the solution SuccessfulKoala55 ! I’ll try that if the solution “delete old bucket, wait for its name to be available, recreate it with the other aws account, transfer the data back” fails

4 years ago

0 Hi, Are The Experiments Logs Stored In S3 Or In The Trains-Server? (When Using S3 As Artifact Storage)

yes 🙂

4 years ago

0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

CostlyOstrich36 , this also happens with clearml-agent 1.1.1 on a aws instance…

3 years ago

I see 3 agents in the "Workers" tab

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

so most likely one hard requirement installs the version 2 of pyjwt while setting up the experiment

4 years ago