
Reputation
Badges 1
44 × Eureka!The only reason is that I can specify the python version to be used and conda will install it. On requirements.txt, the default python version will be used.
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
Thank you, I have defined the AMI manually instead of using the default, now I am getting the following error:
Error: An error occurred (InvalidParameterValue) when calling the RunInstances operation: User data is limited to 16384 bytes
So It could be launched by the clearml cli? I can also try that.
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Hi, any update on that?
Thank you, After running the script, I run docker-compose -f /opt/clearml/docker-compose.yml up -d
?
Thank you, now I am getting AttributeError: 'DummyModel' object has no attribute 'model_design'
when calling task.update_output_model("best_model.onnx")
. I checked the could I thought that it was related to the model not having a config defined, tried to set it with task.set_model_config(cfg.model)
but still getting the error.
I think not, I have not set any ENV variable. Just went to the web UI, added an autoscaler, filled the data in the UI and launched the autoscaler.
By inspecting the scaler task, it is running the following docker image: allegroai/clearml-agent-services-app:app-1.1.1-47
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I used the autogenerated clearml.conf, I will try erasing the unnecessary parts.
I am using hydra to configure my experiments. Specifically, I want to retrieve the OmegaConf data created by hydra, config = task.get_configuration_objects()
returns a string with those values, but I do not know how to parse it, or whether I can get this data in a nested dict.
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
Do I have to have the lock file in the root? or it can be on the working dir?
With the account admin email. The one in which I got the receipt.
Thank you, it is working now.
Thank you, sorry for the daly, I have been playing around with this setup, right now I am generating a requirements.txt with git links to the sibling packages, I had to set on the agent to force the use of the reuqirements.txt. But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
Hi, sorry for the delay. rmdatasets == 0.0.1 is the name of the local package that lives in the same repo as the training code. Instead of picking the relative path to the package.
As as work around I set the setting to force the use of requirements.txt and I am using this script to generate it:
` import os
import subprocess
output = subprocess.check_output(["pip", "freeze"]).decode()
with open("requirements.txt", "w") as f:
for line in output.split("\n"):
if " @" in line...
I just called the script with:
task.set_base_docker(
docker_image="nvidia/cuda:11.7.0-runtime-ubuntu22.04",
# docker_arguments="--privileged -v /dev:/dev",
)
task.execute_remotely(queue_name="default")
Then in the console:
` Exception: Command '['/usr/bin/python3', '-m', 'poetry', 'config', '--local', 'virtualenvs.in-project', 'true']' returned non-zero exit status 1.
Error: Failed configuring Poetry virtualenvs.in-project
failed installing poetry requirements: Comman...
Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
Hydra params are still not upload on 1.0.4
Yes, but the issue is caused because rmdatasets is installed in the local environments, I needed it installed in order to test the code locally, so it is caught on the package list.
I will probably stop installing the sibling packages and adding them manually to sys.path.
Follows the failure part of the log:
` Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.1/lib/python3.10/site-packages (22.2.2)
Collecting Cython
Using cached Cython-0.29.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Collecting boto3==1.24.59
Using cached boto3-1.24.59-py3-none-any.whl (132 kB)
ERROR: Could not find a version that satisfies the requ...
AgitatedDove14 , here follows the full log:
Thank you, I set it, but clearml still creates its own evironment regardless of my environment.yaml.
Yes the example works. As the example, in my code I am basically starting with doing, was not that supposed to work?
` @hydra.main(config_path="config", config_name="config")
def main(cfg: DictConfig):
import os
import pytorch_lightning as pl
import torch
import yaml
import clearml
pl.seed_everything(cfg.seed)
task = clearml.Task.init(
project_name=cfg.project_name,
task_name=cfg.task_name,
) `
I have a task that is already completed, and, in other script, I am trying to load it and analyse the results.