Reputation
Badges 1
44 × Eureka!Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
Thank you, it is working now.
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
Thank you, sorry for the daly, I have been playing around with this setup, right now I am generating a requirements.txt with git links to the sibling packages, I had to set on the agent to force the use of the reuqirements.txt. But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
Thank you, now I am getting AttributeError: 'DummyModel' object has no attribute 'model_design' when calling task.update_output_model("best_model.onnx") . I checked the could I thought that it was related to the model not having a config defined, tried to set it with task.set_model_config(cfg.model) but still getting the error.
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
Yes the example works. As the example, in my code I am basically starting with doing, was not that supposed to work?
` @hydra.main(config_path="config", config_name="config")
def main(cfg: DictConfig):
import os
import pytorch_lightning as pl
import torch
import yaml
import clearml
pl.seed_everything(cfg.seed)
task = clearml.Task.init(
project_name=cfg.project_name,
task_name=cfg.task_name,
) `
Thank you for your response, so what is the difference between sync and add? By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Thanks, I thought on doing that, but I was wondering if there was a way to set it into the code.
Thank you, I was importing everything inside the function so the the hydra autocomplete would run faster.
SuccessfulKoala55 , how do I set the agent version when creating the autoscaler?
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
So downgrading to python 3.8 would be a workaround?
AgitatedDove14 , here follows the full log:
Hi, any update on that?
I have a task that is already completed, and, in other script, I am trying to load it and analyse the results.
Let's see if I got how it works on the CLI.
So if I execute:clearml-data create --name <improved_dataset> --parents <existing_dataset_id>
Where the parent dataset was updated with sync,
I just need to run:clearml-data upload --id <created_dataset_id>
And the delta will be automatically uploaded to the new dataset?
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Hi, sorry for the delay. rmdatasets == 0.0.1 is the name of the local package that lives in the same repo as the training code. Instead of picking the relative path to the package.
As as work around I set the setting to force the use of requirements.txt and I am using this script to generate it:
` import os
import subprocess
output = subprocess.check_output(["pip", "freeze"]).decode()
with open("requirements.txt", "w") as f:
for line in output.split("\n"):
if " @" in line...
Hydra params are still not upload on 1.0.4
I used the autogenerated clearml.conf, I will try erasing the unnecessary parts.
I just called the script with:
task.set_base_docker(
docker_image="nvidia/cuda:11.7.0-runtime-ubuntu22.04",
# docker_arguments="--privileged -v /dev:/dev",
)
task.execute_remotely(queue_name="default")
Then in the console:
` Exception: Command '['/usr/bin/python3', '-m', 'poetry', 'config', '--local', 'virtualenvs.in-project', 'true']' returned non-zero exit status 1.
Error: Failed configuring Poetry virtualenvs.in-project
failed installing poetry requirements: Comman...
Thank you, After running the script, I run docker-compose -f /opt/clearml/docker-compose.yml up -d ?
Thank you, I set it, but clearml still creates its own evironment regardless of my environment.yaml.
Yes, tried with python 3.8, now it works.
So I guess I am referring to the auto package detection. I am running the job though the web ui. My actual problem is that I have a private repo on my requirements.txt (listed with the github url) that is not being installed. Also, my environment.yaml uses python 3.8, while 3.9 is being installed.