JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Answers 1023

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

SO I updated the config with:
resource_configurations { A100 { instance_type = "p3.2xlarge" is_spot = false availability_zone = "us-east-1b" ami_id = "ami-04c0416d6bd8e4b1f" ebs_device_name = "/dev/xvda" ebs_volume_size = 100 ebs_volume_type = "gp3" key_name = "<my-key-name>" security_group_ids = ["<my-sg-id>"] subnet_id = "<my-subnet-id>" } }
but I get in the logs of the autoscaler:
` Warning! exception occurred: An error occurred (InvalidParam...

4 years ago

0 Hi, I Am Currently Using

Yes! not a strong use case though, rather I wanted to ask if it was supported somehow

2 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Same for regexp, damn

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

So when I create a task using `task = Task.init(project_name=config.get("project_name"), task_name=config.get("task_name"), task_type=Task.TaskTypes.training, output_uri=" s3://my-bucket ") locally, the artifact is correctly logged remotely, but when I create the task remotely (from an agent) the artifact is logged locally (in the agent machine, not on s3)

5 years ago

0 Hi Guys, I Had Several Times Now The Following Errors Poping In Agents While Executing A Task:

Sure, where can I find this file?

4 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

yes -> but I still don't understand why the post_packages didn't work, could be worth investigating

4 years ago

0 Hi, I Am Currently Using

I can live with the current setup for now

2 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

yes exactly

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

erf, I have the same problem with ProxyDictPreWrite 😄 What is the use case of this one ?

3 years ago

0 Hey, I Have A Problem With The Following Task:

in the UI the value is correct one (not empty, a string)

5 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

This https://stackoverflow.com/questions/65109764/wildcard-search-issue-with-long-datatype-in-elasticsearch says long types can be converted to string to do the search

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

AgitatedDove14 Good news, I was able to reproduce the bug on the pytorch distributed sample 🤩
Here it is > https://github.com/H4dr1en/trains/commit/642c1130ad1f76db10ed9b8e1a4ff0fd7e45b3cc

3 years ago

0 Hi There, Maybe This Was Already Asked But I Don'T Remember: Would It Be Possible To Have The Clearml-Agent Switch Between Docker Mode And Virtualenv Mode At Runtime, Depending On The Experiment

How about the overhead of running the training on docker on a VM?

2 years ago

SuccessfulKoala55 I tried to setup in a different machine the clearml-agent and now I get a different error message in the logs:
Warning: could not locate requested Python version 3.6, reverting to version 3.6 clearml_agent: ERROR: Python executable with version '3.6' defined in configuration file, key 'agent.default_python', not found in path, tried: ('python3.6', 'python3', 'python')

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

This is new right? it detects the local package, uninstalls it and reinstalls it?

4 years ago

even if I move the Github workers internally where they could have access to the prod server, I am not sure I would like that, because it would pile up test data in the prod server that is not necessary

3 years ago

0 Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

Ok, I could reproduce with Firefox and Chromium. Steps:
Add creds (either via the popup or in the settings) Go the /settings/webapp-configuration -> Creds should be there Hit F5 Creds are gone

3 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

CostlyOstrich36 How is clearml-session setting the ssh config?

3 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Ok, but that means this cleanup code should live somewhere else than inside the task itself right? Otherwise it won't be executed since the task will be killed

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

AgitatedDove14 Unfortunately no, I already had the problem before using the function, I added it hoping it would fix the issue but it didn’t

4 years ago

0 Hi Again, Is There A Way To Pass Secrets As Parameters Of A Task? I Have An Experiment That Requires Connecting To A Database, And I Need To Be Able To Pass The Creds As Task Params (Or In Another Way, I Don'T Know Yet). But I Don'T Want To Expose My Cred

Ok, good to know, thanks!

4 years ago

0 Hi, I Would Like To Follow-Up In This

I am happy if I can be of any help to fix that 😄

3 years ago

self.clearml_task.get_initial_iteration() also gives me the correct number

4 years ago

Ho wow! is it possible to not specify a remote task? (If i am working with Task.set_offline(True))

3 years ago

0 Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

I don’t have a registry to push my image to.I think I can get around it actually - Will it work if I just build the image locally once, then start the agent? Docker would recognise that image locally and just use it right? I won’t need to update that image often anyway

3 years ago

same as the first one described

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

I am actually calling later in the start_training function the following:
with idist.Parallel(backend="nccl") as parallel: parallel.run(training_func)So my backend should be nccl and not gloo , right? Not sure how important it is, I read in the https://pytorch.org/docs/stable/distributed.html#which-backend-to-use that nccl should be used for distributed GPU training and gloo for distributed CPU training
I will try with clearml==1.1.5rc2

3 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Yes, I set:
auth { cookies { httponly: true secure: true domain: ".clearml.xyz.com" max_age: 99999999999 } }It always worked for me this way

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Ok, so what worked for me in the end was:
config = task.connect_configuration(read_yaml(conf_path)) cfg = OmegaConf.create(config._to_dict())

3 years ago

0 Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

GrumpyPenguin23 yes, it is the latest
AgitatedDove14 , what I was looking for was: parent_task = Task.get_task(task.parent)

5 years ago

Show more results