Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
SparklingHedgehong28
Moderator
6 Questions, 31 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

28 × Eureka!
0 Votes
24 Answers
519 Views
0 Votes 24 Answers 519 Views
Hello! I’m wondering if there is an option to run a termination hook script at the end of the docker job execution (such as https://clear.ml/docs/latest/docs...
one year ago
0 Votes
23 Answers
607 Views
0 Votes 23 Answers 607 Views
Hello, a question on using the AWS Autoscaler app in the SaaS – can a custom IAM role be used for launching the machines? I’m looking at giving custom permis...
one year ago
0 Votes
2 Answers
643 Views
0 Votes 2 Answers 643 Views
Hello, is there a way to create another admin user on Pro plan in the Saas? Alternatively I can rename and change e-mail of the current one, but it’s used SS...
one year ago
0 Votes
2 Answers
488 Views
0 Votes 2 Answers 488 Views
Hello! A question https://clearml.slack.com/archives/CTK20V944/p1656420047506249 . Is it possible to get an untracked file uploaded when remote agent executi...
one year ago
0 Votes
6 Answers
516 Views
0 Votes 6 Answers 516 Views
Hello, I’m seeing this issue when launching a machine with the app – 2022-07-05 09:55:49,759 - usage_reporter - INFO - Sending usage report for 60 usage seco...
one year ago
0 Votes
3 Answers
548 Views
0 Votes 3 Answers 548 Views
one year ago
0 Hello, I’M Seeing This Issue When Launching A Machine With The App –

It’s a clone of a previous one, since I’ve failed -> cloned -> changed params -> failed -> clone -> …

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

T hanks. I guess there are too many moving parts in the official implementation that need adaptation, and wrap up – such as the use of credentials instead of IAM, since it's designed to work cross-cloud (or cloud-agnostic), hence for us it's easier to reimpl the wheel. 🙃

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Lambda’s are designed to be short-lived, I don’t think it’s a fine idea to run it in a loop TBH.

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Thank you, Martin. Probably then a simple Lambda that constantly monitors the workers and sets/unsets the protection flag should work. Though I’d avoid writing timestamp to any kind of state. What if I write the last active state in an instance tag? This could be a solution…
w = get_clearml_workers() for instance in w: if instance['processing_job'] is True: instance_tag['last_job_seen'] = current_time() else: compare_times_and_allow_shutdown_if_idle() ...

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Hey AgitatedDove14 , thanks for having this discussion 🙂
We are collecting machine/task data from ClearML API using a Lambda and push it to CloudWatch as 1 or 0 datapoints per-machine, for a machine doing work or not accordingly. Another lambda, run on an ASG termination event, compares the incoming machine list with the list of machines from CW which are not running anything for x minutes and return the intersection. The ASG then terminates only machines doing nothing during the last per...

one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?

one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Thank you for explaining. As we are implementing a custom auto-scaler for the machines, it looks like the queue length - or a rate of messages falling into the queue - are the best indicators for scaling up or down.

one year ago
one year ago
one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Hi SuccessfulKoala55 , just wondering – you mentioned the open-source autoscaler code version, but where is it hosted? I’ve only found https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py , but it looks like code to launch a managed auto-scaler instead 🙂

one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Hello Jake, sorry for the delay. I found the original auto scaler script and now understand more about how stuff works.
Can you please help understand how Clear assigns jobs to queues – will there be more than 1 job per machine at a single point in time? I do understand that if there are 2 agents running for each GPU, then a vacant agent will probably take a job from the queue. But what’s the general case here?
Thanks so much and sorry if this was already explained – feel free to point me to...

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

And maybe adding idle time spent without a job to API is not that a bad idea 😉

one year ago
one year ago
0 Hello, I’M Seeing This Issue When Launching A Machine With The App –

Nope, this is the ‘autoscaler app’ from the web interface of the SaaS. Nothing self-hosted at the moment.

one year ago
0 Hello! A Question

Hi CostlyOstrich36 Thank you for reaching back and sorry for my embarrassingly long answer here.
We launch the job on a remote executor this way –
From a repository where the code lies we launch a job with clearml-task in this form: ` clearml-task --name projx-1-debug-$(date +%s) --project kbc --folder . --script projx/kbcrun.py --requirements custom-requirements.txt --docker python:3.9.13 --task-type training --queue aws-cpu-docker --args graph_dir=not/needed config_path=examples/tr...

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

It’s like a completion hook when the job terminates (whatever success or failure).
What I’m thinking of: instance scale-in in an ASG doesn’t happen if instance protection is enabled:
Agent fetches job and starts container; Instance protection enabled with API call ran in extra_docker_shell_script , job launched. Job finishes; Instance protection get disabled in this post-run hook , instance may be terminated.

one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

also, when the AZ spec is left empty (it’s optional), machines fail to start with
Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

So this is not for end-user convenience like sending slack messages, but rather system-related hooks useful for auto-scaling, internal API’s and such. If this functionality is not available out of the box, we’d need to resort to looking into scaling-in in a different way. We think of:
Scaling in on very low group average CPU/GPU usage. Non-reliable, because a machine could be running data uploading or else low-load work. Using an https://docs.aws.amazon.com/autoscaling/ec2/userguide/lambda-c...

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Hey AgitatedDove14 , basically https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-instance-protection.html would allow blocking the machine from being scaled-in when there is a scale-in event in the ASG.
The ASG is responsible for spinning up on demand in the ClearML queue, but spinning down is less trivial – we cannot just spin down is the queue is empty (some machine can still be running something important!)

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

This a re-implementation I'd say.
Every instance is running an agent in docker mode. One agent = one task for autoscaling purposes.

one year ago
0 Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Thank you for the reply, Alon and looking into the issue. I’m not speaking of the app/launcher app itself inheriting IAM permissions by assuming the role – this is absolutely understandable as it’s in your cloud. Contrary, when launching an AWS auto-scaler app, what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Sounds okay... Will I need state to calculate the idle time over time, or there's some idle param in the API answer? Because ideally I'd run this in a stateless lambda.

one year ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Tricky question!

I see this asg with a TargetTrackingPolicy for both scale up (if queue size >0) and down, but scale down goes (additionally or only) through a custom policy – check if specific machine can be shutdown. For this we need to make sure there's no job running there. Two ways to do it –

  1. Instance protection set on/off which is simple.
  2. Compare machines that the ASG wants to shutdown with machines having tasks {} retrieved from the API. If task is running, avoid ...
one year ago
0 If I Have 1 Machine With A Gpu, Can I Put A Worker On It With Gpu And Two Workers With

Perhaps you may if can you assign CPU cores to the agent? I’d avoid doing GPU and CPU at the same time as my GPU load creates a decent CPU load by itself.

one year ago
Show more results compactanswers