AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Cannot Upload A Dataset With A Parent - Seems Very Odd! Clearml Versions I Tried: 1.6.1, 1.6.2 Scenario: * Create Parent Dataset (With Storage On S3) * Upload Data * Close Dataset * Create Child Dataset (Tried With Storage On Both S3 Or On Clearml Serv

quick update, still trying to reproduce ...

3 years ago

0 When I Run

Hi BoredHedgehog47 I'm assuming the nginx on the k8s ingest is refusing the upload to the files server
JuicyFox94 wdyt?

3 years ago

0 Hello,

to avoid downgrade to clearml==1.9.1
I will make sure this is solved in clearml==1.9.3 & clearml-session==0.5.0 quickly

2 years ago

0 Hello, I Am Using Clearml In Docker Mode. I Have A Simple Script That Runs Locally, Runs On The Target Machine Running The Same Tensorflow Container, But Doesn'T Run When I Deploy It Using Clearml. Here'S The Log Of The Error:

What about Calling Taskl.init Without the agent?

3 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

Nice SubstantialElk6 !
BTW: you can configure your cleaml client to store the changes from the latest Pushed commit (and not the default which is latest local commit)
see store_code_diff_from_remote: in clearml.conf:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/docs/clearml.conf#L150

4 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).

This seems like a question to GS storage, maybe we should open an issue there, their backend does the rate limit

My main concern now is that this may happen within a pipeline leading to unreliable data handling.

I'm assuming the pipeline code will have max_workers, but maybe we could have a configuration value so that we can set it across all workers, wdyt?

If
...

3 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

Hi PanickyMoth78
Yes i think you are correct, this looks like gs throttling your connection. You can control the number of concurrent uploads with max_worker=1
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/datasets/dataset.py#L604
Let me know if it works

3 years ago

0 Hi, I Just Started Using Clearml, And It Is Amazing! However I'Ve Run Into An Issue - I Have A Windows Machine Which I'Ve Setup As A Worker, With An Agent Running. I'Ve Thus Far Been Able To Run The Hello World Tasks And Have Also Set It Up To Store All D

Hi CrookedAlligator14

Hi, I just started using clearml, and it is amazing!

Thank you! 😍

When I enqueue the task, the venv is setup and starts to install all the packages from the

requirements.txt

file, but at the end I get the following in the console:

Can you try with the latest agent, we improved the support for pytorch (they now have a proper pypi compatible repo), can you see if that solves it?
pip3 install clearml-agent==1.5.0rc0

3 years ago

0 Base_Template_Keras_Simply.Py

No worries 🙂

4 years ago

0 Hi, Just To Check. Does The K8S Glue Install Torch By Default? I'M Getting

SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)

4 years ago

0 Hello I'M New Here, I Found This Error When Testing My Tensorflow / Keras Model. I Already Create The Model Endpoint By Running Command 'Clearml-Serving --Id <Service_Id> Model Add --Engine Triton --Endpoint "Model_Name"... '. Also My Tensorflow / Keras M

MoodyCentipede68 from your log

clearml-serving-triton | E0620 03:08:27.822945 41 model_repository_manager.cc:1234] failed to load 'test_model_lstm2' version 1: Invalid argument: unexpected inference output 'dense', allowed outputs are: time_distributed

This seems the main issue of triton failing to.load
Does that make sense to you? how did you configure the endpoint model?

3 years ago

0 Also, Small Question On Logging Inference Data: I Ran An Experiment To Train A Model. Now I Want To Run Inference Using That Model And Log Inference Metrics To The Same Experiment Which Has Training Details. So Overall There Is Just One Experiment Which

SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:
Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')

5 years ago

0 How Do You Create Parameters For Grid Search? I See Classes For Random Sampling Et C But I Don'T See Anything For Fixed Parameter List

Hi @<1539055479878062080:profile|FranticLobster21>
Like this?
https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-optimization/hyper_parameter_optimizer.py
[https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-opt...

2 years ago

0 Hello If I Try To Create A Dataset From Code, As Shown In This Example I Have Two Questions:

Sorry my bad, you are looking for:
None

4 years ago

0 Any Idea Why I Would Be Getting The Following Error When Running A Task In A Clearml-Agent? (Python 3.7.9, Package_Manager.Type = Conda)

I am using importlib and this is probably why everythings weird.

Yes that will explain a lot 🙂
No worries, glad to hear it worked out

4 years ago

0 Hi! Can Someone Show Me An Example Of How

BTW: I think an easy fix could be:
if running_remotely(): pipeline.start() else: pipeline.create_draft()

3 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)
task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:
ln -s myoffline_folder ~/.trains/cache/offline

5 years ago

0 Does Anyone Get These Junk Logs From Matplotlib While Using Clearml? Is There A Way To Disable It?

StraightDog31 how did you get these ?
It seems like it is coming from maptplotlib, no?

4 years ago

0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.

Is this like a multi-node training, rather than a pipeline ?

2 years ago

0 Hi Everyone, I Was Looking Into Clearml Integration With Nvidia For Transfer Learning. Does Clearml Have Plans To Integrate With The New Tao? Looks Like Nvidia Is Focusing Tao As A Low Code Transfer Learning Tool With Everything Done In Command Line, Whic

it looks like nvidia is going to come up with an UI for TAO too

Interesting, any reference we could look at ?

3 years ago

0 Hi Clearml Community. I Interviewed Nir Bar-Lev On The Practical Ai Podcast, So I Had Allegro/Clearml In The Back On My Mind. I’M Launching A New Project At My Org Now, And I Think Clearml Might Be A Good Fit. Questions That Have Come Up Are:

Hi GleamingGrasshopper63

How well can the ML Ops component handle job queuing on a multi-GPU server

This is fully supported 🙂
You can think of queues as a way to simplify resources for users (you can do more than that,but let's start simple)
Basicalli qou can create a queue per type of GPU, for example a list of queues could be: on_prem_1gpu, on_prem_2gpus, ..., ec2_t4, ec2_v100
Then when you spin the agents, per type of machine you attach the agent to the "correct" queue.

Int...

4 years ago

0 Hi, I'M Trying To Reproduce The Pipeline Example

Hi SplendidToad10
In order to run a pipeline you first have to create the steps (i.e Tasks).
This is usually dont by running the code once (basically running any code with Task.init call will create a Task for that specific code, including the enviroement definition needed to reproduce it by the Agent)

4 years ago

0 Hi, I Have Quite A Generic Question. Basically, I Am Picking Your Brains For Any Solution. Our Current Pipeline Has (Clearml-Data, Clearml And Seldon). We Were Looking For Some Workflow Orchestrator To Stitch Them Up. One Scenario:

Hi DeliciousBluewhale87
This sounds like a great workflow to implement.
I guess my first question is how do you imagine the manager/director interacting with the system? What will they be shown, to allow them to approve/decline the model promotion ?

4 years ago

0 Hi, When Using

Correct 🙂
I'm assuming the Task object is not your Current task, but a different one?

3 years ago

0 Another Issue Is The Agent Uses Python 2 For Some Reason Even Though Locally I’M Using Python 3 And The Agent Is Supposed To Use A Python 3 Venv.

Just making sure, the original code was executed on python 3?

4 years ago

0 Another Issue Is The Agent Uses Python 2 For Some Reason Even Though Locally I’M Using Python 3 And The Agent Is Supposed To Use A Python 3 Venv.

BTW:
This is very odd "~/.clearml/venvs-builds.3/3.6/bin/python" it thinks it is using "python 3.6" but it is linked with python 2.7 ...
No idea how that could happen

4 years ago

0 Hi

HugeArcticwolf77 actually it is more than that, you can embed the graphs now in the markdown, when you hove over any plot/graph/image you now have a new button that copies the embed test, so that you can directly copy it into your markdown editor (internal or external)
More documentation and screenshots are coming after the holiday, mean time you can check:
https://clear.ml/docs/latest/docs/webapp/webapp_reports
https://clear.ml/docs/latest/assets/images/webapp_report-695dddd2ec8064938bf8...

2 years ago

0 Another Issue Is The Agent Uses Python 2 For Some Reason Even Though Locally I’M Using Python 3 And The Agent Is Supposed To Use A Python 3 Venv.

BTW: do notice to install the agent on the system python packages and Not on any venv.

4 years ago

0 Hi Dear Community, My Name Is Christoph And We Try To Use Clearml Free Tier With Agents. However, We Have The Problem That The Agent Gets Stuck On Execution (V1.8.1) - No Matter If Using Virtualenv Or Docker As Virtualization, And Aarch Or Amd64 Architec

I have one agent running on the machine. I also have only one task running. This

only

happens to us when we use pipelines

@<1724960468822396928:profile|CumbersomeSealion22> notice that when you are launching a pipeline you are actually running Two tasks, one is the "pipeline" itself (i.e. the logic) and one is the component in the pipeline (i.e. the step)
If you have one agent, I'm assuming what happens is the pipeline itself (the one that you launch on your machine)...

one year ago

0 Hi, I Have This Issue With Clearml Datasets. Do You Know Hot To Solve It?

Hi LazyFish41
Could it be some permission issue on /home/quetalasj/.clearml/cache/ ?

4 years ago

Show more results