ElatedRaven55

1 Question, 7 Answers

Active since 07 March 2025

Last activity 8 months ago

Reputation

Questions 1
Answers 7

0 Votes

0 Answers

623 Views

0 Votes 0 Answers 623 Views

Hey, We Are Still Stuck With This Issue And Provided All Required Logs Inside This Thread With Comparisons Between Different Modes Running Tasks, Any Conclusions?

Hey, We are still stuck with this issue and provided all required logs inside this thread with comparisons between different modes running tasks, any conclus...

clearml

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701070390366208:profile|CostlyOstrich36>
If I understand correctly than I believe that that's exactly what I did in the previous comment provided logs.
I have done also experiments with cases where i run task manually in the VM without clearml-agent listening to the queue, but directly running the task manually, and also the results and connects were fast.
From all 3 types of executions I've tested, the only problematic case with the long connect times is the one where I push tasks to ...

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701070390366208:profile|CostlyOstrich36>
Same problem here, migrated my autoscalers workload from AWS ec2 instances to GCP VMs with same base images and docker images on top, running the exact same tasks results around 45 minutes extra for task.connect() step when triggered from the GCP VMs compared to AWS instances that pass this step in the task in less than a minute.
Using the managed clearml server (app.clear.ml).
#MeToo

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701087100473344:profile|SuccessfulKoala55> I tried it as you asked, it just makes tasks to fail and apparently 'DEBUG' is just not a valid value for the 'CLEARML_API_VERBOSE' field, and only true/false are valid values.
I did find another option which is valid and might be what you meant though, which is:
"-e=CLEARML_LOG_LEVEL=DEBUG"
I am providing you the logs for the new tests with this variable set, yet I am pretty sure it makes no difference in the logs, especially not with anythin...

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701070390366208:profile|CostlyOstrich36> Thanks for the reply!
When spinning a machine manually with the same base image and running the task without autoscaler the issue is not happenning, only with instances that the autoscaler creates

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701070390366208:profile|CostlyOstrich36>
We created a very basic and simple task to demonstrate the difference in times between task running from an autoscaler spinned up instance VS manual spinned up instance with clearml-agent,
the task code is as follows:

from clearml import Task
import time

mydict = {"a": 1, "b": 2}
task = Task.init(project_name="test", task_name="test_small_dict")
task.execute_remotely(queue_name="tomer_queue")

# measure the time the function executes
star...

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701070390366208:profile|CostlyOstrich36> @<1523701087100473344:profile|SuccessfulKoala55>
Is there anything else I can provide you to proceed with understanding the issues?

8 months ago

0 Hey, I'M Using Clearml Gcp Autoscaler And It Seems That

@<1523701087100473344:profile|SuccessfulKoala55> I added the -e CLEARML_API_VERBOSE=true to the configurations like you asked, although I am not sure it made any changes to the actual logs.
I'm providing logs of the autoscaler that took ~20.8 seconds for a simple small dict connect
VS
the manual spinned clearml-agent listener on a manual created vm that took ~1.5 second

8 months ago