AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hello Everyone. I Don'T Uderstand Why Is My Training Slower With Connected Tensorboard Than Without It. I Have Some Thoughts About It But I Not Sure. My Internet Traffic Looks Wierd.I Think This Is Because Tensorboard Logs Too Much Data On Each Batch And

Okay so the way it works is that it moves all the logging to background process, But if you have a Lot of data, actually pushing the data between python processes is Not very efficient. This line basically tells it to just use background thread (instead of background process), for sending the data to the server.
The idea behind using background process in the first place is to better support pytorch workers that spin a lot of subprocesses, and we do not want to add a thread per process and in...

2 years ago

0 I’M Using Catboost For Training, But Sadly It Does Not Have A Native Integration With Clearml (Xgboost And Lightgbm Do Have Integrations). But Catboost Writes Down Training Logs In Tensorboard Format (Into A

Actually that is less interesting, as it is quite straight forward

3 years ago

0 I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

a task of queue B if the next task is of type A it will have to wait,

It seems you imply there are two types of Tasks and they need to be executed one after the other ?

3 years ago

0 After Trying To Execute A Task From The Queue The Agent Fails Installing The Environment:

ERROR: torch-1.12.0+cu102-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform
TartBear70 could it be you are running on a new Mac M1/2 ?

Also quick question, any chance you can test with the latest RC?
pip3 install clearml-agent==1.3.1rc6

2 years ago

0 Hello, We Use Clearml With A Torch.Distributed (Ddp, On Only 1 Machine But With Multiple Process) Training, And We Found That Clearml Intercepts And Changes The Exit Code Of Our Process (I.E. Exit(1) Does Not Exit 1 Anymore), And Torch.Multiprocessing.Spa

VirtuousFish83 is the exit(1) called from the main process or a subprocess? Are you running it with an agent?

2 years ago

0 Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.

4 years ago

0 Hi, I'M On A Machine That Normally Connects To Storage Using

BTW: if you could implement _AzureBlobServiceStorageDriver with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620

3 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂

3 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Is task.parent something that could help?

Exactly 🙂 something like:
# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)

3 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

Ohh wow

3 years ago

0 Getting This Error At

No worries 🙂

3 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

This should have worked with the latest clearml RC.
And you verified it is not working?

3 years ago

0 Hey All, Is There A Way To Setup Scalar Plotting So That Series On The Same Scalar Plot Will Have Different Colors?

I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs

3 years ago

0 I Am Trying To Use

i.e. you can:
curl

3 years ago

0 Anyone Using Trains With Snakemake? I Am Running My Workflow With Snakemake In A Docker Container, And It Can Output To The Trains Server Of Course, But Executing A Task From Trains Ui Tries To Run The Script In Its Own Container... It Downloads An Ubuntu

BroadMole98 Awesome, can't wait for your findings 🙂

4 years ago

0 Hi, I'M Trying To Get Tensorboard Plots Into The Allegro Trains Server. Although I Followed The Example

I guess I got confused since the color choices in

One of the most beloved features we added 🙂

4 years ago

0 Hi When We Try And Sign Up A User With Github. The Invitation Link Never Works. Given They Have Already Signed Up With Their Github

Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,

Hi when we try and sign up a user with github.

wait, where are you getting this link?

11 months ago

0 Heyo, After Building Some Custom Pipelining Functionality On Mlflow, I Started Looking For Better Software That Can Beat What I Created - With A Similar Amount Of Effort. Problem Has Been That Up Till Now, All I Found Could Make Things Way Better But Al

That makes sense to me, what do you think about the following:
` from clearml import PipelineDecorator

class AbstractPipeline(object):
def init():
pass

@PipelineDecorator.pipeline(...)
def run(self, run_arg):
data = self.step1(run_arg)
final_model = self.step2(data)
self.upload_model(final_model)

@PipelineDecorator.component(...)
def step1(self, arg_a):
# do something
return value

@PipelineDecorator.component(...)
def step2(self, arg_b):
# do ...

one year ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work

one year ago

0 Task Struck At

Hi PanickyMoth78

it was uploading fine for most of the day but now it is not uploading metrics and at the end

Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?
packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_eventsThis seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens

one year ago

0 I Have A Set Up An Agent, On A Gpu Machine, And Spun Up The Daemon In Docker Moder, And Specifically Specified A Gpu That It Will Work With. The Image Is Okay And I Verified That By Running

We might need to change the default base docker image, but I remember it was there... Let me check again

4 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

I want pipeline / task dispatch to be reported and monitored outside of clearml. For example, I might want to log the dispatch event in some non-clearml system and then monitor the health of the pipeline and alert if if it is pending for too long.Hmm interesting, so like a callback?!
I'm thinking a callback is being executed after the Pipelines is sent, but once the callback is done, the pipeline process leaves?
Does that make sense ?
I might want to dispatch other jobs from within the same p...

2 years ago

0 Hi There, I Used

I remember there were some issues with it ...

I hope not 😞 Anyhow the only thing that does matter is the auto_connect arguments (meaning if you want to disable some, you should pass them when calling Task.init)

2 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

FileNotFoundError: [Errno 2] No such file or directory: '/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_38e9acc8d56441999e806815abddee82.clearml'

Let me check this issue, it seems like the locking mechanism should have figured that there is no lock...

3 years ago

0 In Pipelinev2, Is It Possible To Register Artifacts To The Pipeline Task? I See There Is A Private Variable

BTW: the new pipeline decorator interface example is here:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

2 years ago

0 Are People Using Devcontainer + Clearml-Session?

With k8s glue going, want to finally look at clearml-session and how people are using it.

If used with k8s glue, you will have to run the glue with --ports-mode, then the clearml session will know how to connect to container itself, since at runtime the container will register the gateway + port for the learml-session client to connect to

3 years ago

0 I Want To Run My Clearml Task On An Agent In K8S Together With A Memory Profiler (Maybe

hmm that is odd.
Can you send the full log ?

3 years ago

0 Hi! Does Clearml Have A Way To Turn On/Off Virtual Machines Depending If There Are Experiments On Queue?

Not yet 😞
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:

def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104

def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...

3 years ago

0 {"Detail":"Error Processing Request: Error: Failed Loading Preprocess Code For 'Py_Code_Best_Model': [Errno 2] No Such File Or Directory: '/Root/.Clearml/Cache/Storage_Manager/Global/Cd46Dd0091D71B5294Dc6870Ac6D17Dc..._Artifacts_Archive_Py_Code_Best_Model

think this is because of the version of xgboost that serving installs. How can I control these?

That might be

I absolutely need to pin the packages (incl main DS packages) I use.

you can basically change CLEARML_EXTRA_PYTHON_PACKAGES
https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L100
for example:
export CLEARML_EXTRA_PYTHON_PACKAGES="xgboost==1.2.3 numpy==1.2.3"

one year ago

0 I Have A General Question About This Part In Dynamic Gpu Allocation. If For Example I Have A Machine That Has 8 Gpus And I Have 3 Queues: Queue1 Will Take 3Gpus, Queue2 Will Take Another 3Gpus, So In Queue3 Can I Put 2-4 Gpus?? If There Are Idle Gpus So T

so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen

This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...

one year ago

Show more results