Reputation
Badges 1
979 × Eureka!SuccessfulKoala55 They do have the right filepath, eg:https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png
Also maybe we are not on the same page - by clean up, I mean kill a detached subprocess on the machine executing the agent
So I changed ebs_device_name = "/dev/sda1"
, and now I correctly get the 100gb EBS volume mounted on /
. All good π
Interesting! Something like that would be cool yes! I just realized that custom plugins in Mattermost are written in Go, could be a good hackday for me π to learn go
yea I just realized that you would also need to specify different subnets, etcβ¦ not sure how easy it is π But it would be very valuable, on-demand GPU instances are so hard to spin up nowadays in aws π
no, I think I could reproduce with multiple queues
Ok, I got the following error when uploading the table as an artifact:ValueError('Task object can only be updated if created or in_progress')
Yes AgitatedDove14 π
ok, and if not the case, it will fall back to 3.8, right? Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
And since I ran the task locally with python3.9, it used that version in the docker container
So either I specify in the clearml-agent agent.python_binary: python3.8 as you suggested, or I enforce the task locally to run with python3.8 using task.data.script.binary
Hi SmugDolphin23 thanks for the input! Will try now but that seems hacky: to have it working I have to specify python3.8 two times:
one in the agent config file (agent.default_python is already python3.8, but seems to be ignored) + make sure it is available (using python:3.8 docker image)Is there a way to prevent this redundancy? Ie. If I want to change the python version, I can control it from a single place?
And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
Not really because this is difficult to control: I use the AWS autoscaler with ubuntu AMI and when an instance is created, packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended
Thanks for your input TenseOstrich47 , I was considering using a secret manager now, I guess that's the best option. I can move the secrets wherever I need them to be to make it work π
AMI ami-08e9a0e4210f38cb6
, region: eu-west-1a
the deep learning AMI from nvidia (Ubuntu 18.04)
so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...
there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent wonβt start because the userdata script fails
Now it starts, Iβll see if this solves the issue
Awesome, thanks WackyRabbit7 , AgitatedDove14 !
Ok thanks! And for this?
Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
I didnβt use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)
def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)
engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `