FriendlySquid61

0 Questions, 62 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Answers 62

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

As an example you can ssh to it and try running trains-agent manually to see if it's installed and if it fails for some reason.

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

ok that's odd.
Anyway try setting
extra_configurations = {"SubnetId": "<subnet-id>"}instead of:
extra_configurations = {'SubnetId': "<subnet-id>"}

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Probably something's wrong with the instance, which AMI you used? the default one?

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Can you send me your configurations? I want to make sure there's nothing we're missing there.
(without the actual keys and secrets of course)

4 years ago

0 Hi, My Devsecops Team Has Raised Some Issues Of Us Deploying Clearml For Use. In Particular, They Are Not Happy With Docker.Sock Configuration As It Would Potentially Expose The Entire Cluster To Unauthorised View. Can We Do Without It?

So the issue was probably the clearml-agent version.
Please try using clearml-agent==0.17.2rc3 and let us know if this solved the issue.

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

yes

4 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Can you try removing the port from the webhost?

4 years ago

0 Question About The Auto Scaling Service Under

Hey WackyRabbit7 ,
Is this the only error you have there?
Can you verify the credentials in the task seem ok and that it didn't disappear as before?
Also, I understand that the Failed parsing task parameter ... warnings no longer appear, correct?

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

If it does appear in the UI faster, than it's only a matter of waiting. it you still don't see the instance, I'd suggest you to ssh to the instance and investigate a bit what's going on

4 years ago

0 Question About The Auto Scaling Service Under

If the configurations and hyper params still appear properly in the task there's no need to rerun the wizard. just make sure you're using the updated trains repo

4 years ago

0 Hi All, I Was Wondering If It Is Possible To Set The Aws Autoscaler (And Other Aws Services Such As S3) To Assume The Permissions Of A Specific Iam Role. I Didn'T Find Any Reference To This In The Documentation

Hey LovelyHamster1 ,
If s3 is what you're interested of, then the above would do the trick.
Note that you can attach the IAM using instance profiles. You can read about those here:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
Once you have an instance profile, you can add it to the autoscaler using the extra_configurations section in the autoscaler.
Under your resource_configurations -> some resource name -> add an ...

4 years ago

0 Hi, I'M Using The Aws Autoscaler To Spin Instances. I'D Like To Use The Clearml Agent On The Created Instances With Docker Containers. However Even If I Set Default_Docker_Image In The Parameters On The Ui To Nvidia/Cuda:11.1.1-Runtime-Ubuntu20.04 The Tas

Sure, ping me if it's still happening.

4 years ago

If you want the agent to run in docker mode, the docker.sock should be exposed. But that's the only reason for this configuration.

4 years ago

Hey LovelyHamster1 ,
This means that for some reason the agent on the instances created fails to run and the instance is terminated.
The credentials could definatly cause that.
Can you try adding the credentials as they appear in your clearml.conf?
To do so, create new credentials from your profile page in the UI, and add the entire section to the extra_trains_conf section in the following way:
` extra_trains_conf = """
api {
web_server: "<webserver>"
api_server: "<apiserver>"
...

4 years ago

What do you mean by ' not taking effect with the k8s glue '?

4 years ago

0 Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hey JitteryCoyote63 ,
Autoscaler was tested with full ec2 permissions.
I believe you only need the following:
ec2:StartInstances ec2:StopInstances ec2:DescribeInstancesBut there might be some others we're missing.
WackyRabbit7 - I think you asked this question before, do you have some more input you can share here?

4 years ago

Seems like the env variable isn't passed for some reason, we'll push a fix for this issue soon, I'll keep you posted 🙂

4 years ago

0 Question About The Auto Scaling Service Under

Searching this error it seems it could be many things.
Either wrong credentials or a wrong region (different than the one for your key-pair).
It could also be that your computer clock is wrong (see example https://github.com/mitchellh/vagrant-aws/issues/372#issuecomment-87429450 ).
I suggest you search it online and see if it solves the issue, I think it requires some debugging on your end.

4 years ago

How do you pass it to the task?

4 years ago

Hey LovelyHamster1 ,
Any chance the task you are trying to run has a base docker defined in it?

4 years ago

0 Question About The Auto Scaling Service Under

Sure, we're using RunInstances , you can see the call itself https://github.com/allegroai/trains/blob/master/trains/automation/aws_auto_scaler.py#L163

4 years ago

Good, are we sure that the problem is that the variable isn't set?
Can you please use kubectl describe pod <task-pod-name> and send me the output?

4 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

BTW, is there any specific reason for not upgrading to clearml? 🙂

4 years ago

I'd suggest that you try what AgitatedDove14 suggested https://clearml.slack.com/archives/CTK20V944/p1614540843119600?thread_ts=1613923591.002100&cid=CTK20V944 . It seems like you're using an older version of the agent somehow.
I think using the glue could be a good solution for you, so it seems like a good idea to try and get it to work.

4 years ago

Hey SubstantialElk6 ,
I'm assuming you are referring to our helm charts?
If so, then you can set agent.dockerMode to false ( https://github.com/allegroai/clearml-server-k8s/blob/master/clearml-server-chart/values.yaml#L46 ), and then the docker.sock configuration will be turned off. Note that this means that your agents will not be running on docker mode 🙂

4 years ago

0 Agent-Services: Networks: - Backend Container_Name: Trains-Agent-Services Image: Allegroai/Trains-Agent-Services:Latest Restart: Unless-Stopped Privileged: True Environment: Trains_Host_Ip: ${Trains_Host_Ip} Train

Hey GreasyPenguin14 ,
The docker-compose.yml and this section specifically were updated.
So first please try again with the new version 🙂
Second - this error seems a bit odd, which version of docker-compose are you using?
You can check this using: docker-compose --version

4 years ago

That's the agent-services one, can you check the agent's one?

4 years ago

0 Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Hey SubstantialElk6 ,
Can you show us the top output you get when using the template-yaml instead of overrides-yaml?

4 years ago

Again, assuming you are referring to the helm charts. How are you deploying ClearML?

4 years ago

0 Question About The Auto Scaling Service Under

Those are different credentials.
You should have the aws info under:
cloud_credentials_key , cloud_credentials_secret and cloud_credentials_region
And the stuff added to the extra_vm_bash_script are the trains key and secret from your profile page in the UI.
I suggest you use the wizard again to run the task, this will make sure all the data is where it should be.

4 years ago

Show more results