AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 With

So, what I am referring to is the ability of a system to allow some rigor and robustness of tracking of experiments, and also enforcing some thoughts on how things might be deployed, early on in the development process, whilst not being overly prescriptive and cumbersome

I'm cannot agree more!!
VivaciousPenguin66 We are working on trying to better understand how to solve this very delicate act of balance and offer some sort of "JIRA" for ML.
If this is okay with you, once product pe...

3 years ago

0 Hi. Help

from the screenshot it looks like a "gs://" access issue?
could that be it?

2 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:
from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())

3 years ago

0 Hello! How Can I Use "Report_Scatter2D" In Order To Report Timestamp In The X-Axis?

SweetGiraffe8
That might be it, could you test with the Demo server ?

3 years ago

0 Hi, Is There A Way To Not Upload Results By Default To The Clearml Demo Server?

GreasyPenguin14 yes there is 🙂
https://github.com/allegroai/clearml/issues/209
Set environment variable CLEARML_NO_DEFAULT_SERVER=1

3 years ago

0 Hi, Guys! Thank You A Lot For Your Great Software, But I'Ve Got A Problem. I Have Got Two Remotes: Gitlab And Gitea. The Branch From Which I Run The Code Is Upstreamed With Gitea. However, In The Clearml Experiment, Gitlab Repository Is Automatically Sele

MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?

$ git ls-remote --get-url origin

2 years ago

0 Hi, Is There Any Document About Migration Clearml-Server. Currently, I Have Clearml-Server Running On Servera But I Want To Move All Data (Including Artifacts, Task, Dataset) From Servera To Serverb.

And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)

2 years ago

0 Hi, I Am Experiencing Issues When Uploading Artifacts To The Dataset Task With Clearml Version V1.1.4Rc0. The Problem Is The Artifacts Are Uploaded To The Default Clearml Server, Even Though I Have Specified The Path To Our Storage Medium. The Code To Dem

But I think this error has only appeared since I upgraded to version 1.1.4rc0

Hmm let me check something

2 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)

This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.

open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).

Yes, that's exactly how clearml is designed, a...

one year ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

If the manual execution (i.e. pycharm) was working it should have stored it on the Pipeline Task.

3 years ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

Any recommendation or working combinations of AMI

I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[…]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...

one year ago

0 Hello Everyone! Is It Possible To Deactivate Package Analysis For Remote Execution? I Run My Code With Clearml-Agent In Docker Mode With Nvidia:Pytorch Container. When Clearml Is Running Inside The Docker The Installed Packages Of The Webui Get Updated. H

preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.

Okay that's the part that I'm missing, how come in the first run the package existed and in the cloned Task they are missing? I'm assuming agents are configured basically the same (i.e. docker mode with the same network access). What did I miss here ?

3 years ago

0 Hey, I'Ve Spin Up A Worker Using Aws Autoscaler In Clearml Self Hosted Server Running On Kubernetes. However, I Can'T Find The Agent On The Workers Page. Any Idea Why It'S Not Showing Up? Full_Log:

@<1595587997728772096:profile|MuddyRobin9> are you sure it was able to spin the EC2 instance ? which clearml version autoscaler are you running ?

one year ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?

one year ago

0 Hello! I Have The Following Error In The Task'S Console:

FierceRabbit20 it seems the Pipeline Task that was created is missing the "installed requirements" section. How are you creating the actual pipeline Task? is this from code?

one year ago

0 Hi! I’Ve Run A Task In A Docker Container With Memory Constraint 16Gb (Clearml-Task ….. --Docker_Args “--Memory=16G”), So I Expected To See The Max Memory Available Equal 16Gb In Web Ui (Scalars/Monitor:Machine), But It Shows Memory Available In The Whole

EnviousPanda91 notice that when passing these arguments to clearml-agent you are actually passing default args, if you want an additional argument to Always be used, set the extra_docker_arguments here:
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/docs/clearml.conf#L170

one year ago

0 I Am Using Pipelines (Just Starting) And I Am Checking Different Options For Overriding Parts Of Configuration Of The Base Task (Step Of My Pipeline). In The Docs For Parameter_Override One Can Find:

Hi UpsetTurkey67

"General/my_parameter_name" so that only this part of the configuration will be updated?

I'm assuming this is a Hyperparameter not a configuration object (i.e. task.connect not task.connect_configuration), if this is the case then Yes 🙂

2 years ago

0 Hello, The Problem: Clearml Ui (And Service In General, E.G. Task Logging) Is Unreachable Via The Vpn-Internal Ip Of The Machine It Was Deployed On. Was Reachable Last Week And Before. Background: Clearml Server On Linux On Remote Machine, Client - Lo

What happened in the server configuration that all of a sudden you have zero ports open?

one year ago

0 Clearml-Session Fails Ssh Tunneling. It Does Not Use Key Auth, Instead Sets Up Some Weird Password And Then Fails To Auth:

set the following:
CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible

2 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself

if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) returnI think this ensures the local execution actually triggers the remote one, so it should be as you expect, no?

2 years ago

0 Hey! I Have My Custom Model, That Uses Models From Populars Frameworks Inside, Such As Lgbm, Catboost Etc. Also It Have Multiple Instances Of One Models Of One Framework.

Yes please 🤩

2 years ago

0 I Have A Local Folder A, And A Dataset B. A:

RoughTiger69
move the files locally (i.e. based on the example move folder b into folder a ) Create a new version with two parents ('a' and 'b') then sync the local root folder ('a' in your case). Only the meta-data should change (because the referenced files are already in one of the datasets)wdyt?

2 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

Now I'm curious what's the workaround ?

2 years ago

0 I Wanted To Ask, How To Run Pipeline Steps Conditionally? E.G If Step Returns A Specific Value, Exit The Pipeline Or Run Another Step Instead Of The Sequential Step

regrading the actual artifact access, this is the usual Task.artifacts access: see example here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py

2 years ago

0 I'M Training A Tensorflow Model And Saving It In The End. I Looked At The Outputmodel Class. How Do I Connect The Model I'M Saving To The Outputmodel?

I think that what you need is to create an OutputModel , then call update weights file when you have the better model, this will also allow you to tag the model object. Would that help? Or would it make sense to use Task.models and count on the auto logging?

2 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)

3 years ago

0 Hey, I'M Probably Being Thick Here But I Would Like To Pull Some Data From A Database And Write It To A Particular Bucket In S3 Within A Task I'M Doing. I'M Using Task.Upload_Artifact But Can'T Understand Where I Write The Bucket Path.

SmallBluewhale13 the final path is automatically generated, you only need to specify the bucket itself. By default it will be your "files_server"
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/docs/clearml.conf#L10
You can either change the configuration (which will make sure All uploaded artificats will always be there, including debug images etc.)
You can specify where you want the artifacts and debug images to be uploaded by setting:
https://allegro....

3 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

It does not upload, the default behavior is to log the artifact (so you know where you stored, but not enforce unnecessary uploads)
If you were to change:
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')to:
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example', output_uri=" ")It would also upload the model

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Yes docker was not installed in the machine

Okay make sense, we should definitely check that you have docker before starting the daemon 😉

Ok, it would be nice to have a --user-folder-mounted that do the linking automatically

It might be misleading if you are running on k8s cluster, where one cannot just -v mount volume...
What do you think?

3 years ago

It should have ....

3 years ago

Show more results