Reputation
Badges 1
44 × Eureka!I am using hydra to configure my experiments. Specifically, I want to retrieve the OmegaConf data created by hydra, config = task.get_configuration_objects() returns a string with those values, but I do not know how to parse it, or whether I can get this data in a nested dict.
I have a task that is already completed, and, in other script, I am trying to load it and analyse the results.
Thank you, I have defined the AMI manually instead of using the default, now I am getting the following error:
Error: An error occurred (InvalidParameterValue) when calling the RunInstances operation: User data is limited to 16384 bytes
I just called the script with:
task.set_base_docker(
docker_image="nvidia/cuda:11.7.0-runtime-ubuntu22.04",
# docker_arguments="--privileged -v /dev:/dev",
)
task.execute_remotely(queue_name="default")
Then in the console:
` Exception: Command '['/usr/bin/python3', '-m', 'poetry', 'config', '--local', 'virtualenvs.in-project', 'true']' returned non-zero exit status 1.
Error: Failed configuring Poetry virtualenvs.in-project
failed installing poetry requirements: Comman...
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
Thank you, it is working now.
I used the autogenerated clearml.conf, I will try erasing the unnecessary parts.
Do I have to have the lock file in the root? or it can be on the working dir?
With the account admin email. The one in which I got the receipt.
Thank you, sorry for the daly, I have been playing around with this setup, right now I am generating a requirements.txt with git links to the sibling packages, I had to set on the agent to force the use of the reuqirements.txt. But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
Hi, sorry for the delay. rmdatasets == 0.0.1 is the name of the local package that lives in the same repo as the training code. Instead of picking the relative path to the package.
As as work around I set the setting to force the use of requirements.txt and I am using this script to generate it:
` import os
import subprocess
output = subprocess.check_output(["pip", "freeze"]).decode()
with open("requirements.txt", "w") as f:
for line in output.split("\n"):
if " @" in line...
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
The only reason is that I can specify the python version to be used and conda will install it. On requirements.txt, the default python version will be used.
Thank you for your response, so what is the difference between sync and add? By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
AgitatedDove14 , here follows the full log:
Thank you, After running the script, I run docker-compose -f /opt/clearml/docker-compose.yml up -d ?
This is being started as a command line script.
Also tried saving the model with:task.set_model_config(cfg.model) task.update_output_model("best_model.onnx")But got the same exception,
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Follows the failure part of the log:
` Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.1/lib/python3.10/site-packages (22.2.2)
Collecting Cython
Using cached Cython-0.29.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Collecting boto3==1.24.59
Using cached boto3-1.24.59-py3-none-any.whl (132 kB)
ERROR: Could not find a version that satisfies the requ...
I think not, I have not set any ENV variable. Just went to the web UI, added an autoscaler, filled the data in the UI and launched the autoscaler.
By inspecting the scaler task, it is running the following docker image: allegroai/clearml-agent-services-app:app-1.1.1-47
So I guess I am referring to the auto package detection. I am running the job though the web ui. My actual problem is that I have a private repo on my requirements.txt (listed with the github url) that is not being installed. Also, my environment.yaml uses python 3.8, while 3.9 is being installed.
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
Thank you, now I am getting AttributeError: 'DummyModel' object has no attribute 'model_design' when calling task.update_output_model("best_model.onnx") . I checked the could I thought that it was related to the model not having a config defined, tried to set it with task.set_model_config(cfg.model) but still getting the error.
Its a S3 bucket, it is working since I am able to upload models before this call and also custom artifacts on the same script.
Ubuntu 18.04
Python: 3.9.5
Clearml: 1.0.4
Yes, but the issue is caused because rmdatasets is installed in the local environments, I needed it installed in order to test the code locally, so it is caught on the package list.
I will probably stop installing the sibling packages and adding them manually to sys.path.