AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi! I Recently Updated My Server And My Clearml Version, Now When I Set A Task To Be Executed Remotely Its Default State Is Aborted Hence I Have To Reset And Enqueue, Is There Something I Am Doing Wrong (I Am Using Hydra Too)?

GrievingTurkey78 notice that when enqueuing an aborted Task, the agent will not deleted the previously reported metrics/logs

3 years ago

0 Hi, Is There A Concept Of An Agent Taking More Then One Job?

Hi RipeGoose2
Yes, the "services-mode" of an agent will take multiple Tasks, that said, these are "service" i.e. light CPU tasks, think pipeline controllers etc.

3 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

SubstantialElk6 I just executed it , and everything seems okay on my machine.
Could you pull the latest clearml-agent from the github and try again ?

EDIT:
just try to run:
git clone cd clearml-agent python examples/k8s_glue_example.py

3 years ago

0 Is It Possible To Schedule Pipelines On Events Like Dataset Update?

How can i make it such that any update to the upstream database

What do you mean "upstream database"?

3 years ago

0 Back To Autoscaler; Is There Any Way To Ensure The Environment Variables On The Services Queue (Where The Scaler Runs) Will Be Automatically Exposed To New Ec2 Instance? Some Bash Hack Or Similar Would Be Nice, Really

the services queue (where the scaler runs) will be automatically exposed to new EC2 instance?

Yes, using this extra_clearml_conf parameter you can add configuration that will be passed to the clearml.conf of the instances it will spin.
Now an example to the values you want to add :
agent.extra_docker_arguments: ["-e", "ENV=value"]https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
wdyt?

2 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

Actually, dumb question: how do I set the setup script for a task?

When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂

one year ago

0 Hello Clearml Community, Does Anyone Have An Idea How I Could Integrate/Manager Carla (

ReassuredTiger98 are you saying you want to be able to run the pipeline as a standalone and as "remote pipeline",
Or is this for a specific step in the pipeline that you want to be able to run standalone/pipelined ?

2 years ago

0 With

So, what I am referring to is the ability of a system to allow some rigor and robustness of tracking of experiments, and also enforcing some thoughts on how things might be deployed, early on in the development process, whilst not being overly prescriptive and cumbersome

I'm cannot agree more!!
VivaciousPenguin66 We are working on trying to better understand how to solve this very delicate act of balance and offer some sort of "JIRA" for ML.
If this is okay with you, once product pe...

3 years ago

0 Hi, I'M Using The Autoscaler And Getting The Error

Hi CloudySwallow27

This error occurs randomly during training (in other words training does successfully start).

What's the cleamrl-agent version you are using, and the clearml version ?

2 years ago

0 Hi. Help

from the screenshot it looks like a "gs://" access issue?
could that be it?

2 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:
from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())

3 years ago

0 Hello! How Can I Use "Report_Scatter2D" In Order To Report Timestamp In The X-Axis?

SweetGiraffe8
That might be it, could you test with the Demo server ?

3 years ago

0 Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

WDYT?

3 years ago

0 Hi, Is There A Way To Not Upload Results By Default To The Clearml Demo Server?

GreasyPenguin14 yes there is 🙂
https://github.com/allegroai/clearml/issues/209
Set environment variable CLEARML_NO_DEFAULT_SERVER=1

3 years ago

0 Hi, Guys! Thank You A Lot For Your Great Software, But I'Ve Got A Problem. I Have Got Two Remotes: Gitlab And Gitea. The Branch From Which I Run The Code Is Upstreamed With Gitea. However, In The Clearml Experiment, Gitlab Repository Is Automatically Sele

MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?

$ git ls-remote --get-url origin

2 years ago

0 Hi, Is There Any Document About Migration Clearml-Server. Currently, I Have Clearml-Server Running On Servera But I Want To Move All Data (Including Artifacts, Task, Dataset) From Servera To Serverb.

And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)

2 years ago

0 Hi, I Am Experiencing Issues When Uploading Artifacts To The Dataset Task With Clearml Version V1.1.4Rc0. The Problem Is The Artifacts Are Uploaded To The Default Clearml Server, Even Though I Have Specified The Path To Our Storage Medium. The Code To Dem

But I think this error has only appeared since I upgraded to version 1.1.4rc0

Hmm let me check something

2 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)

This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.

open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).

Yes, that's exactly how clearml is designed, a...

one year ago

0 Hi, Is There Any Way To Get Experiment Debug Images Programmatically?

Hi HandsomeCrow5 .
Remember the debug images are events with links to the actual images, so you first have to get the events and then you can download the images with https://allegro.ai/docs/examples/examples_storagehelper/#storagemanager (which by definition has the credentials, because it was able to upload them 🙂
To get the events:
from trains.backend_api.session.client import APIClient client = APIClient() client.events.debug_images(task='aabbcc')

4 years ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

If the manual execution (i.e. pycharm) was working it should have stored it on the Pipeline Task.

3 years ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

Any recommendation or working combinations of AMI

I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[…]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...

one year ago

0 Hello Everyone! Is It Possible To Deactivate Package Analysis For Remote Execution? I Run My Code With Clearml-Agent In Docker Mode With Nvidia:Pytorch Container. When Clearml Is Running Inside The Docker The Installed Packages Of The Webui Get Updated. H

preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.

Okay that's the part that I'm missing, how come in the first run the package existed and in the cloned Task they are missing? I'm assuming agents are configured basically the same (i.e. docker mode with the same network access). What did I miss here ?

3 years ago

0 Hey, I'Ve Spin Up A Worker Using Aws Autoscaler In Clearml Self Hosted Server Running On Kubernetes. However, I Can'T Find The Agent On The Workers Page. Any Idea Why It'S Not Showing Up? Full_Log:

@<1595587997728772096:profile|MuddyRobin9> are you sure it was able to spin the EC2 instance ? which clearml version autoscaler are you running ?

one year ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?

one year ago

0 Hello! I Have The Following Error In The Task'S Console:

FierceRabbit20 it seems the Pipeline Task that was created is missing the "installed requirements" section. How are you creating the actual pipeline Task? is this from code?

one year ago

0 Hi! I’Ve Run A Task In A Docker Container With Memory Constraint 16Gb (Clearml-Task ….. --Docker_Args “--Memory=16G”), So I Expected To See The Max Memory Available Equal 16Gb In Web Ui (Scalars/Monitor:Machine), But It Shows Memory Available In The Whole

EnviousPanda91 notice that when passing these arguments to clearml-agent you are actually passing default args, if you want an additional argument to Always be used, set the extra_docker_arguments here:
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/docs/clearml.conf#L170

one year ago

0 I Am Using Pipelines (Just Starting) And I Am Checking Different Options For Overriding Parts Of Configuration Of The Base Task (Step Of My Pipeline). In The Docs For Parameter_Override One Can Find:

Hi UpsetTurkey67

"General/my_parameter_name" so that only this part of the configuration will be updated?

I'm assuming this is a Hyperparameter not a configuration object (i.e. task.connect not task.connect_configuration), if this is the case then Yes 🙂

2 years ago

0 Hello, The Problem: Clearml Ui (And Service In General, E.G. Task Logging) Is Unreachable Via The Vpn-Internal Ip Of The Machine It Was Deployed On. Was Reachable Last Week And Before. Background: Clearml Server On Linux On Remote Machine, Client - Lo

What happened in the server configuration that all of a sudden you have zero ports open?

one year ago

0 Clearml-Session Fails Ssh Tunneling. It Does Not Use Key Auth, Instead Sets Up Some Weird Password And Then Fails To Auth:

set the following:
CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible

2 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself

if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) returnI think this ensures the local execution actually triggers the remote one, so it should be as you expect, no?

2 years ago

Show more results