
Reputation
Badges 1
44 × Eureka!Do I have to have the lock file in the root? or it can be on the working dir?
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
Thanks, I thought on doing that, but I was wondering if there was a way to set it into the code.
Hi, any update on that?
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
Thank you, I set it, but clearml still creates its own evironment regardless of my environment.yaml.
So It could be launched by the clearml cli? I can also try that.
AgitatedDove14 , here follows the full log:
So I guess I am referring to the auto package detection. I am running the job though the web ui. My actual problem is that I have a private repo on my requirements.txt
(listed with the github url) that is not being installed. Also, my environment.yaml
uses python 3.8, while 3.9 is being installed.
Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
Thank you, it is working now.
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
Follows the failure part of the log:
` Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.1/lib/python3.10/site-packages (22.2.2)
Collecting Cython
Using cached Cython-0.29.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Collecting boto3==1.24.59
Using cached boto3-1.24.59-py3-none-any.whl (132 kB)
ERROR: Could not find a version that satisfies the requ...
Yes, but the issue is caused because rmdatasets is installed in the local environments, I needed it installed in order to test the code locally, so it is caught on the package list.
I will probably stop installing the sibling packages and adding them manually to sys.path.
Yes the example works. As the example, in my code I am basically starting with doing, was not that supposed to work?
` @hydra.main(config_path="config", config_name="config")
def main(cfg: DictConfig):
import os
import pytorch_lightning as pl
import torch
import yaml
import clearml
pl.seed_everything(cfg.seed)
task = clearml.Task.init(
project_name=cfg.project_name,
task_name=cfg.task_name,
) `
Hydra params are still not upload on 1.0.4
Thank you, After running the script, I run docker-compose -f /opt/clearml/docker-compose.yml up -d
?
I am using hydra to configure my experiments. Specifically, I want to retrieve the OmegaConf data created by hydra, config = task.get_configuration_objects()
returns a string with those values, but I do not know how to parse it, or whether I can get this data in a nested dict.
The only reason is that I can specify the python version to be used and conda will install it. On requirements.txt, the default python version will be used.
With the account admin email. The one in which I got the receipt.
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
Thank you, now I am getting AttributeError: 'DummyModel' object has no attribute 'model_design'
when calling task.update_output_model("best_model.onnx")
. I checked the could I thought that it was related to the model not having a config defined, tried to set it with task.set_model_config(cfg.model)
but still getting the error.
I have a task that is already completed, and, in other script, I am trying to load it and analyse the results.
They are on the same repo on git, something like:my-repo train project1 project2 libs lib1 ...
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
I used the autogenerated clearml.conf, I will try erasing the unnecessary parts.
Thank you, I was importing everything inside the function so the the hydra autocomplete would run faster.