StoutElephant16

5 Questions, 32 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

32 × Eureka!

Questions 5
Answers 32

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hi All

Hi All 🙂 I am trying to run a Hyperparameter Optimization Task, where the controller task is submitted to the services queue (and picked up by the the defau...

clearml

2 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi All

Hi All 🙂 I am self hosting my ClearML Server on an EC2 instance on AWS. As far as I understood based on None " By default, the open source ClearML Server ru...

mlops

2 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hey All

Hey all 🙂 I'm having trouble using the clearml-agent command. I am executing an experiment from a code repository and I am using a requirements.txt file to ...

clearml

2 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi Everybody, I Found That It Is Possible To Schedule Experiments Using The Taskscheduler Class In Python

Hi everybody, I found that it is possible to schedule experiments using the TaskScheduler class in python https://clear.ml/docs/latest/docs/references/sdk/sc...

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Hey hey, I having trouble with ClearML and ALBs in the AWS. Could someone help me? 🙂 I am currently trying to deploy ClearML in the AWS. The Basic Infrastru...

clearml

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

This gives me a 200 🙂

3 years ago

0 Hi All

@<1593051292383580160:profile|SoreSparrow36> thanks a lot, I'll try it out 😉 Did I get it right? You have the public DNSs for CLEARML_WEB_HOST and CLEARML_FILES_HOST (both without http:// or https://)?

2 years ago

0 Hi All

@<1523701087100473344:profile|SuccessfulKoala55> but the problem still persists. Any other ideas?

2 years ago

0 Hi All

Good idea!
So, my api server is CLEARML_API_HOST= None and I ran telnet apiserver 8008 and received:

Trying 172.18.0.6...
Connected to apiserver.
Escape character is '^]'.

It seems the container is able to resolve the address and connect.

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

File Server

3 years ago

0 Hi All

In my environment I have defined CLEARML_API_HOST (hard coded in docker-compose), CLEARML_WEB_HOST , CLEARML_FILES_HOST , CLEARML_API_ACCESS_KEY , CLEARML_API_SECRET_KEY , CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS .

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Thanks a lot for the help debugging!

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

And it's still unhealthy. I am starting to suspect that somehow the Autoscaling Part in between the ALB and the ClearML server could be causing the problem.

3 years ago

0 Hi All

Here's my docker-compose, maybe I'm missing something 😄 And thanks again for the support 😉

2 years ago

0 Hi All

Currently I have the environment variable CLEARML_API_HOST= None set and CLEARML_HOST_IP is empty. I assume that the latter is not needed when the CLEARML_API_HOST is defined.

2 years ago

0 Hi Everybody, I Found That It Is Possible To Schedule Experiments Using The Taskscheduler Class In Python

Yes exactly, like a cron Job. Thanks a lot!

3 years ago

0 Hi All

Thanks a lot! Yes, I don't see such a worker in the UI. docker ps returns the containers below. I suppose the clearml-apiserver is the relevant one.

2 years ago

0 Hey All

Hey SuccessfulKoala55 . I use my own custom Daemon that in turn runs clearml-agent execute for some complicated reasons (other correlated processes) I want to be able to fetch and execute only certain task id, instead of pulling one from the queue.

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

These are the seetings for health check now

3 years ago

0 Hi All

UPDATE: Now the agent-services is working 🙂 I was able to solve it by providing CLEARML_API_HOST: ${CLEARML_API_HOST:- None } in my docker-compose instead of CLEARML_API_HOST: None , where the environment variable CLEARML_API_HOST was set as my public api address. So in other words, the traffic is going through the internet, back to the server (same machine) and now it seems to be working. Thanks @<1593051292383580160:...

2 years ago

0 Hi All

After about 8hrs running I finally got clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server None ?

2 years ago

0 Hi All

UPDATE: setting SHUTDOWN_IF_NO_ACCESS_KEY: 1 allowed me to see the agent-services container, and then a docker inspect clearml-agent-services showed me that the environment variables needed for the agent in the docker-compose.yml were empty. So the problem was in my bootstrap script.

Because SHUTDOWN_IF_NO_ACCESS_KEY was set to 0 before, the container would disappear 🙂

Thanks a lot for helping me figure this out!

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Yes!

3 years ago

0 Hi All

Yes thanks a lot 🙂 This already helped me a lot 😉 I'll investigate!

2 years ago

0 Hi All

I have this block in my docker compose:

  agent-services:
    networks:
      - backend
    container_name: clearml-agent-services
    image: allegroai/clearml-agent-services:latest
    deploy:
      restart_policy:
        condition: on-failure
    privileged: true
    environment:
      <....>
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/clearml/agent:/root/.clearml
    depends_on:
      - apiserver
    entrypoint: >
      bash -c "curl --retry 10 --retr...

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

API

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

And I could access the web server even if the health check was failing. So that was not a problem in the end.

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Web Server

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

JuicyFox94 I think I found the problem. To my absolute shame, the security group of the ALB had no Outbound rules, i.e. no traffic was allowed out of the ALB 🙈 . Now I can access the ClearML Webserver!

3 years ago

0 Hey All

Fantastic thanks a lot 🙂

2 years ago

0 Hi All

I left the environment variables out to keep things short, but there is one SHUTDOWN_IF_NO_ACCESS_KEY: 1 . Maybe some authentication is failing and the container is stopping.

2 years ago

0 Hi All

Hi @<1593051292383580160:profile|SoreSparrow36> , thanks a lot! I ran docker network connect backend clearml-agent-services and got the response:
Error response from daemon: endpoint with name clearml-agent-services already exists in network clearml_backend
It was expected because my docker-compose had the entry

  agent-services:
    networks:
      - backend

I can also resolve and curl None from the clearml-agent-services container.

I managed...

2 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

Ok, I think that's been very helpful 🙂 I'll experiment a little, now that I know a Health Check that must work. I'll write here if I find something! Thanks a lot for the awesome support!

3 years ago

0 Hey Hey, I Having Trouble With Clearml And Albs In The Aws. Could Someone Help Me?

But I still have one thing I'd like to fix: the health check for the file server on port 8081 gives me unhealthy for path "/". Is there a valid path you know I can use there for health checks? A curl gives me

3 years ago

0 Hey All

Ok thanks a lot for the Info! For now (as a simple error handling): is there any way I can tell the ClearML Server that the experiment should be cancelled using the shell?

2 years ago

Show more results