Reputation
Badges 1
44 × Eureka!I used the autogenerated clearml.conf, I will try erasing the unnecessary parts.
Thank you, it is working now.
Yes the example works. As the example, in my code I am basically starting with doing, was not that supposed to work?
` @hydra.main(config_path="config", config_name="config")
def main(cfg: DictConfig):
import os
import pytorch_lightning as pl
import torch
import yaml
import clearml
pl.seed_everything(cfg.seed)
task = clearml.Task.init(
project_name=cfg.project_name,
task_name=cfg.task_name,
) `
SuccessfulKoala55 , how do I set the agent version when creating the autoscaler?
So It could be launched by the clearml cli? I can also try that.
AgitatedDove14 , here follows the full log:
Its a S3 bucket, it is working since I am able to upload models before this call and also custom artifacts on the same script.
Ubuntu 18.04
Python: 3.9.5
Clearml: 1.0.4
Thank you, I set it, but clearml still creates its own evironment regardless of my environment.yaml.
Actually, this error happens when a launch the autoscaler from the Web UI, when I enqueue a task, it launches an EC2 instance which "Status Check" stays in "Pending" for over 15 minutes and then the instance is terminated by the scaler that launches another one in a loop.
I am launching through the UI, "XXX workspace / https://app.clear.ml/applications / AWS Autoscaler".
I think not, I have not set any ENV variable. Just went to the web UI, added an autoscaler, filled the data in the UI and launched the autoscaler.
By inspecting the scaler task, it is running the following docker image: allegroai/clearml-agent-services-app:app-1.1.1-47
Thanks, I thought on doing that, but I was wondering if there was a way to set it into the code.
The only reason is that I can specify the python version to be used and conda will install it. On requirements.txt, the default python version will be used.
This is being started as a command line script.
Also tried saving the model with:task.set_model_config(cfg.model) task.update_output_model("best_model.onnx")
But got the same exception,
Tried using a custom python version:
` FROM nvidia/cuda:11.7.0-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev
xz-utils tk-dev libffi-dev liblzma-dev git
&& rm -rf /var/lib/apt/lists/*
RUN git clone ~/.pyenv
RUN /root/.pyenv/bin/pyenv install 3.10.6
ENV PATH="/root/.pyenv/versions/3.10....
I run some tests, I think I got it now.
After creating the new dataset, it is necessary to run sync
again, but now only the new files are uploaded.
And when running get
the files on the parent dataset will be available as links.
Let's see if I got how it works on the CLI.
So if I execute:clearml-data create --name <improved_dataset> --parents <existing_dataset_id>
Where the parent dataset was updated with sync,
I just need to run:clearml-data upload --id <created_dataset_id>
And the delta will be automatically uploaded to the new dataset?
Thank you for your response, so what is the difference between sync and add? By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Do I have to have the lock file in the root? or it can be on the working dir?
Hydra params are still not upload on 1.0.4
Hi, AgitatedDove14
How do I set the version to 1.5.1,? When I launch the autoscaler the version 1.5.0 is picked by default.
So I guess I am referring to the auto package detection. I am running the job though the web ui. My actual problem is that I have a private repo on my requirements.txt
(listed with the github url) that is not being installed. Also, my environment.yaml
uses python 3.8, while 3.9 is being installed.
Solved by removing default parts.
Now I got a strange behavior in which I have 2 tasks on queue, the autoscaler fires two EC2 instances and then turn them off without running the tasks, then It fires two new instances again in a loop.
I edited the clearml.conf on the agent and set the manager to poetry, do I need to have poetry installed on the agent beforehand, considering that I am using docker?
Yes, tried with python 3.8, now it works.
Basically, I am following the steps in this video:
https://www.youtube.com/watch?v=j4XVMAaUt3E
I do not recall the older version, It is from a couple of months ago, but the new version is WebApp: 1.2.0-153 • Server: 1.2.0-153 • API: 2.16
With the account admin email. The one in which I got the receipt.
They are on the same repo on git, something like:my-repo train project1 project2 libs lib1 ...
Thank you, I have defined the AMI manually instead of using the default, now I am getting the following error:
Error: An error occurred (InvalidParameterValue) when calling the RunInstances operation: User data is limited to 16384 bytes