![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/RobustRat47.png)
Reputation
Badges 1
89 × Eureka!In short we clone the repo, build the docker container, and run agent in the container. The reason we do it this, rather than provide a docker image to the clearml-agent is two fold:
We actively develop our custom networks and architectures within a containerised env to make it easy for engineers to have a quick dev cycle for new models. (same repo is cloned and we build the docker container to work inside) We use the same repo to serve models on our backend (in a slightly different contain...
Sure, I'll check this out later in the week and get back to you
Hi AgitatedDove14 ,
I noticed that ClearML parses clearml.automation.UniformParameterRange
to configuration space to be used with BOHB. When I've used BOHB previously I can use UniformFloatHyperparameter
from the configuration space package that allows me to set a parameter in logspace. That is the range is defended by something like numpy.logspace
rather than numpy.linspace
I'll add a more detailed response once it's working
$ curl -X 'POST' '
' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "url": "
" }' {"digit":5}
I can run clearml.OutputModel(task, framework='pytorch')
to get the model from a previous task. but how can I get the pytorch model ( torch.nn.Module
) from the output model object
Yep just about to do that. Just annoying to add arg parser etc
Okay just for clarity...
Originally, my Nvidia drivers were running on an incompatible version for the triton serverThis container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.
To fix this issue I updated the drivers on my base OS i.e.sudo apt install nvidia-driver-510 -y sudo reboot
Then it worked. The docker-compose logs from clearml-serving-triton
container did not make this clear (i.e. by r...
Can you try to go into 'Settings' -> 'Configuration' and verify that you have 'Show Hidden Projects' enabled
Okay great thanks SuccessfulKoala55
the agent it for replicating what you run locally elsewhere i.e. remote GPU machine
Hi yes all sorted ! 🙂
For referenceimport subprocess for i in ['1', '2']: command = ['python', 'hyp_op.py', '--testnum', f'{i}'] process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
For ClearML UI2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
` "
Spinning new instance type=aws4gpu
2021-10-19 14:24:38
Error: Can no...
`
2021-10-19 14:19:07
Spinning new instance type=aws4gpu
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
Spinning new instance type=aws4gpu
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
S...
I make 2x in eu-west-2 on the AWS console but still no luck
so I don't think it's an access issue
Hi SuccessfulKoala55 who's the best person on the team to speak with?
It doesn't help that the stacktrace isn't very verbose
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
I've got it... i just remembered I can calltask_id
from the cloned tasked and check the status of that 🙂
Okay thanks for the update 🙂 the account manager got involved and the limit has been approved 🚀
Hi SuccessfulKoala55 thanks I didn't know it was possible to use in place of the pw. So in the .conf I can just add the git PAT instead of pw?
git_user: ${GITHUB_USER} git_pass: ${GITHUB_PAT}
thank you guys 😄 😄
nope you'll just need to install clearml
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent