Reputation
Badges 1
979 × Eureka!no, I think I could reproduce with multiple queues
Ok, I got the following error when uploading the table as an artifact:ValueError('Task object can only be updated if created or in_progress')
Yes AgitatedDove14 π
Hi SmugDolphin23 thanks for the input! Will try now but that seems hacky: to have it working I have to specify python3.8 two times:
one in the agent config file (agent.default_python is already python3.8, but seems to be ignored) + make sure it is available (using python:3.8 docker image)Is there a way to prevent this redundancy? Ie. If I want to change the python version, I can control it from a single place?
Not really because this is difficult to control: I use the AWS autoscaler with ubuntu AMI and when an instance is created, packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended
Thanks for your input TenseOstrich47 , I was considering using a secret manager now, I guess that's the best option. I can move the secrets wherever I need them to be to make it work π
AMI ami-08e9a0e4210f38cb6
, region: eu-west-1a
the deep learning AMI from nvidia (Ubuntu 18.04)
so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...
there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent wonβt start because the userdata script fails
Now it starts, Iβll see if this solves the issue
Awesome, thanks WackyRabbit7 , AgitatedDove14 !
Ok thanks! And for this?
Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
I didnβt use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)
def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)
engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `
As you can see, more hard waiting (initial sleep), and then before each apt action, make sure there is no lock
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: ["
"] agent.python_binary = python3.8
the instances takes so much time to start, like 5 mins
edited the aws_auto_scaler.py, actually I think itβs just a typo, I just need to double the brackets
Interestingly, I do see the 100gb volume in the aws console:
I managed to do it by using logger.report_scalar, thanks!
Hey FriendlySquid61 ,
I ended up asking for full control of EC2 not to be blocked, so unfortunately I cannot give you a more precise list π
Try to spin up the instance of that type manually in that region to see if it is available