
Reputation
Badges 1
44 × Eureka!I am using hydra to configure my experiments. Specifically, I want to retrieve the OmegaConf data created by hydra, config = task.get_configuration_objects()
returns a string with those values, but I do not know how to parse it, or whether I can get this data in a nested dict.
Follows the failure part of the log:
` Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.1/lib/python3.10/site-packages (22.2.2)
Collecting Cython
Using cached Cython-0.29.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Collecting boto3==1.24.59
Using cached boto3-1.24.59-py3-none-any.whl (132 kB)
ERROR: Could not find a version that satisfies the requ...
Hi, sorry for the delay. rmdatasets == 0.0.1 is the name of the local package that lives in the same repo as the training code. Instead of picking the relative path to the package.
As as work around I set the setting to force the use of requirements.txt and I am using this script to generate it:
` import os
import subprocess
output = subprocess.check_output(["pip", "freeze"]).decode()
with open("requirements.txt", "w") as f:
for line in output.split("\n"):
if " @" in line...
Thank you, now I am getting AttributeError: 'DummyModel' object has no attribute 'model_design'
when calling task.update_output_model("best_model.onnx")
. I checked the could I thought that it was related to the model not having a config defined, tried to set it with task.set_model_config(cfg.model)
but still getting the error.
SuccessfulKoala55 , how do I set the agent version when creating the autoscaler?
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I think not, I have not set any ENV variable. Just went to the web UI, added an autoscaler, filled the data in the UI and launched the autoscaler.
By inspecting the scaler task, it is running the following docker image: allegroai/clearml-agent-services-app:app-1.1.1-47
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Thank you, After running the script, I run docker-compose -f /opt/clearml/docker-compose.yml up -d
?
Thank you, it is working now.
I just called the script with:
task.set_base_docker(
docker_image="nvidia/cuda:11.7.0-runtime-ubuntu22.04",
# docker_arguments="--privileged -v /dev:/dev",
)
task.execute_remotely(queue_name="default")
Then in the console:
` Exception: Command '['/usr/bin/python3', '-m', 'poetry', 'config', '--local', 'virtualenvs.in-project', 'true']' returned non-zero exit status 1.
Error: Failed configuring Poetry virtualenvs.in-project
failed installing poetry requirements: Comman...
Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
Do I have to have the lock file in the root? or it can be on the working dir?
Thank you, sorry for the daly, I have been playing around with this setup, right now I am generating a requirements.txt with git links to the sibling packages, I had to set on the agent to force the use of the reuqirements.txt. But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
They are on the same repo on git, something like:my-repo train project1 project2 libs lib1 ...
Thanks, I thought on doing that, but I was wondering if there was a way to set it into the code.
So It could be launched by the clearml cli? I can also try that.
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
I have a task that is already completed, and, in other script, I am trying to load it and analyse the results.
This is being started as a command line script.
Also tried saving the model with:task.set_model_config(cfg.model) task.update_output_model("best_model.onnx")
But got the same exception,
I get the same error:
⋊> /d/c/p/c/e/reporting on master ◦ python model_config.py (longoeixo) 17:48:14
ClearML Task: created new task id=xxx
ClearML results page: xxx
` Any model stored from this point onwards,...
Its a S3 bucket, it is working since I am able to upload models before this call and also custom artifacts on the same script.
Ubuntu 18.04
Python: 3.9.5
Clearml: 1.0.4
So downgrading to python 3.8 would be a workaround?
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
With the account admin email. The one in which I got the receipt.
Yes, tried with python 3.8, now it works.
AgitatedDove14 , here follows the full log:
Hi, any update on that?