Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
FriendlySquid61
Moderator
0 Questions, 62 Answers
  Active since 10 January 2023
  Last activity 2 years ago

Reputation

0
0 Hi Guys, I Would Like To Start Using The Aws Autoscaler Shipped In Trains. I Need To Create A Iam User To Get And I Would Like To Know What Are The Minimal Permissions Required For The Autoscaler To Work?

Hey JitteryCoyote63 !
Can you please update us what permissions did you end up using for the autoscaler?
Were the above enough?
Thanks!

4 years ago
0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

ok, so first, since you have many installations in your bash script, it does make sense that installation would take a long time (note that the agent will only start running after all installations are done)
So for the sake of debugging I'd suggest to remove all the packages (other than the specific trains-agent that you're using) and try again, add those packages to the task you are trying to run and you should see the instance much faster.

4 years ago
0 Hey Guys, Another Question About Deploying My Own Trains Server. I Have A Trains-Server Deployed On My K8S Cluster Using The Trains Helm Chart (Which Is Awesome). Now I Want To Create A Deployment Running Trains-Agent As Specified In The [Trains-Helm Repo

ColossalAnt7 can you try connecting to one of the trains-agent pods and run trains-agent manually using the following command:
TRAINS_DOCKER_SKIP_GPUS_FLAG=1 TRAINS_AGENT_K8S_HOST_MOUNT=/root/.trains:/root/.trains trains-agent daemon --docker --force-current-versionThen let us know what happens and if you see the new worker it in the UI

4 years ago
0 Hey Guys, Another Question About Deploying My Own Trains Server. I Have A Trains-Server Deployed On My K8S Cluster Using The Trains Helm Chart (Which Is Awesome). Now I Want To Create A Deployment Running Trains-Agent As Specified In The [Trains-Helm Repo

That's great, from that I understand that the trains-services worker does appear in the UI, is it correct? Did the task run? Did you change the trainsApiHost under agentservices in the values.yaml?

4 years ago
0 Hi All, I Was Wondering If It Is Possible To Set The Aws Autoscaler (And Other Aws Services Such As S3) To Assume The Permissions Of A Specific Iam Role. I Didn'T Find Any Reference To This In The Documentation

Hey LovelyHamster1 ,
If s3 is what you're interested of, then the above would do the trick.
Note that you can attach the IAM using instance profiles. You can read about those here:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
Once you have an instance profile, you can add it to the autoscaler using the extra_configurations section in the autoscaler.
Under your resource_configurations -> some resource name -> add an ...

4 years ago
0 Hi, I'M Using The Aws Autoscaler To Spin Instances. I'D Like To Use The Clearml Agent On The Created Instances With Docker Containers. However Even If I Set Default_Docker_Image In The Parameters On The Ui To Nvidia/Cuda:11.1.1-Runtime-Ubuntu20.04 The Tas

To check, go to the experiment's page and then to EXECUTION > AGENT CONFIGURATION > BASE DOCKER IMAGE
If it's set to any value, clearing it would solve your problem.

4 years ago
0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Did you change anything under the agent's value?
In case you didn't - please try editing the agent.clearmlWebHost and set it to the value of your webserver (use the same one you used for the agent services).
This might solve your issue.

4 years ago
0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Probably something's wrong with the instance, which AMI you used? the default one?

4 years ago
4 years ago
0 Hi, We Are Running On Disconnected On Prem With A K8S Glue. When A Pod Is Spawned, We Noted That An Apt-Get Command Is Performed On The Pod. Short Of Changing The Content Of /Etc/Apt/Sources.List In The Images, Is There A Way For Clearml Agent To Override

Hey SubstantialElk6 ,
You can see the bash script that installs the container https://github.com/allegroai/clearml-agent/blob/master/clearml_agent/glue/k8s.py#L61 .
You are correct that it does do apt-get update in order to install some stuff.
You can override this entire list of commands by adding another bash script as a string using the container_bash_script argument. Make sure you add it to the example script (should be added to the initialization https://github.com/allegr...

4 years ago
0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

ok that's odd.
Anyway try setting
extra_configurations = {"SubnetId": "<subnet-id>"}instead of:
extra_configurations = {'SubnetId': "<subnet-id>"}

4 years ago
0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

I understand, but for some reason you are getting an error about the clearml webserver. try changing the value in the values.yaml file for the agent.clearmlWebHost to the same value you filled manually for the agent-services Web host

4 years ago
0 Hi All, I Was Wondering If It Is Possible To Set The Aws Autoscaler (And Other Aws Services Such As S3) To Assume The Permissions Of A Specific Iam Role. I Didn'T Find Any Reference To This In The Documentation

Great.
Note that instead of removing those lines you can override it using the extra_vm_bash_script
For example:
extra_vm_bash_script = """ export CLEARML_API_HOST=<api_server> export CLEARML_WEB_HOST=<web_server> export CLEARML_FILES_HOST=<files_server> """

4 years ago
0 Hi, My Devsecops Team Has Raised Some Issues Of Us Deploying Clearml For Use. In Particular, They Are Not Happy With Docker.Sock Configuration As It Would Potentially Expose The Entire Cluster To Unauthorised View. Can We Do Without It?

SubstantialElk6 - As a side-note, since docker is about to be deprecated, sometime in the near future we plan to switch to another runtime. This actually means that the entire docker.sock issue will not be relevant very soon 🙂

4 years ago
0 Question About The Auto Scaling Service Under

Hey WackyRabbit7 ,
Is this the only error you have there?
Can you verify the credentials in the task seem ok and that it didn't disappear as before?
Also, I understand that the Failed parsing task parameter ... warnings no longer appear, correct?

4 years ago
0 Question About The Auto Scaling Service Under

Those are different credentials.
You should have the aws info under:
cloud_credentials_key , cloud_credentials_secret and cloud_credentials_region
And the stuff added to the extra_vm_bash_script are the trains key and secret from your profile page in the UI.
I suggest you use the wizard again to run the task, this will make sure all the data is where it should be.

4 years ago
0 Hi, My Devsecops Team Has Raised Some Issues Of Us Deploying Clearml For Use. In Particular, They Are Not Happy With Docker.Sock Configuration As It Would Potentially Expose The Entire Cluster To Unauthorised View. Can We Do Without It?

Hey SubstantialElk6 ,
I'm assuming you are referring to our helm charts?
If so, then you can set agent.dockerMode to false ( https://github.com/allegroai/clearml-server-k8s/blob/master/clearml-server-chart/values.yaml#L46 ), and then the docker.sock configuration will be turned off. Note that this means that your agents will not be running on docker mode 🙂

4 years ago
0 Hey Guys. I Tried Running The Pytorch Mnist Example On A Train-Agent By Running It Locally And Then Resetting The Experiment And Then Enqueue-Ing It To The Default Queue. All Works Well But It Seems The Environment Building Process Gets Stuck On A Manual

Hey ColossalAnt7 ,
What version of trains-agent are you using?
You can try upgrading to the latest RC version, this issue should be fixed there:
pip install trains-agent==0.16.2rc1

4 years ago
0 Agent-Services: Networks: - Backend Container_Name: Trains-Agent-Services Image: Allegroai/Trains-Agent-Services:Latest Restart: Unless-Stopped Privileged: True Environment: Trains_Host_Ip: ${Trains_Host_Ip} Train

Hey GreasyPenguin14 ,
The docker-compose.yml and this section specifically were updated.
So first please try again with the new version 🙂
Second - this error seems a bit odd, which version of docker-compose are you using?
You can check this using: docker-compose --version

4 years ago
0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

As an example you can ssh to it and try running trains-agent manually to see if it's installed and if it fails for some reason.

4 years ago
Show more results compactanswers