AgitatedDove14

SpicyCrab51 you can change the task to complete, it is just a state change nothing will actually change other than the status. Task.get_task(pass_dataset_id_here).mark_complete()

2 years ago

0 Hello! Something With A Very Long Running

Hi SpicyCrab51 ,
Hmm, how exactly is the Dataset opened?
If the Dataset object is alive for 30h it will keep the dataset alive, why isn't it being closed ?

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.

This is exactly what should happen, are you saying that for some reason it fails?

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)

.. note::
    When continuing the executing of a previously executed Task,
    all previous artifacts / models/ logs are intact.
    New logs will continue iteration/step based on the previous-execution maximum iteration value.
    For example:
    The last train/loss scalar reported was iteration 100, the next report will b...

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi VivaciousWalrus21

After restarting training huge gaps appear in iteration axis (see the screenshot).

The Task.init actually tries to understand what was the last reported interation and continue from that iteration, I'm assuming that what happens is that your code does that also, which creates a "double shift" that you see as the jump. I think the next version will try to be "smarter" about it, and detect this double gap.
In the meantime, you can do:
` task = Task.init(...)...

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

VivaciousWalrus21 I took a look at your example from the github issue:
https://github.com/allegroai/clearml/issues/762#issuecomment-1237353476
It seems to do exactly what you expect. and stores its own last iteration as part of the checkpoint. When running the example with continue_last_task=int(0) you get exactly what you expect
(Do notice that TB visualizes these graphs in a very odd way, and it took me a few clicks to verify it...)

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

My pleasure 🙂

2 years ago

0 Hi! Is There A Way To Run A Task Without Reporting To The Server? For Example If I Want To Debug A Script By Running It Locally Without It Appearing On The Server

I’ll check if I could wrap the code in something that calls the Task.delete if debugging

Whatever you think works best for you, I was genuinely curious 🙂
To me (personally) it is helpful to have a log even while debugging (comparing to previous runs etc, trying to see what went wrong even on a console output level). When I'm done I just search for everything I worked on select all, and archive them. Then a cleanup service in the background clears all the archived Tasks once they ar...

3 years ago

0 Hi All! Question Around Resource Management Using

We actually plan to create different queues for different types of workloads, we are a bit seeing what the actual usage is to define what type of workloads make sense for us.

That sounds like a great path to take, it will make it very clear fro users on what they will be getting and why they should use specific queue.

As for the memory, yes the reasoning is clear, the main thing we'll have to see is hot define the limits, because we have nodes with quite different resources availab...

2 years ago

0 Disclaimer, Not Exactly A Clearml Question. Just Wasn'T Getting A Response When Asked In The Other Channel. Anyway Here It Goes. This Is Not Tool Specific. More Of A General Mlops Question. Given You Have A Classification Model That You Plan To Do Ct On

Linking to my answer here:
https://clearml.slack.com/archives/C028LNAAR5H/p1636989891013200?thread_ts=1636963279.012500&cid=C028LNAAR5H

2 years ago

0 Hi All! Question Around Resource Management Using

Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...

2 years ago

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

SmugOx94 could you please open a GitHub issue with this request, otherwise we might forget 🙂
We might also get some feedback from other users

3 years ago

0 Hi All! Question Around Resource Management Using

Oh that makes sense, This depends on how you setup the clearml k8s glue, (becuase the resource allocation is done by k8s) a good hack to limit the number of containers per GPU is to set a RAM limitation per pod, then k8s will know to limit the number of pods on the same GPU machine,
wdty?

2 years ago

0 Hi All! Question Around Resource Management Using

DefeatedMoth52 how many agents do you have running on the same GPU ?

2 years ago

0 I'M Running Clearml Server Locally Using The Docker-Compose Method As Mentioned

With pleasure 😊

2 years ago

0 Hi, Is There Any Document About Migration Clearml-Server. Currently, I Have Clearml-Server Running On Servera But I Want To Move All Data (Including Artifacts, Task, Dataset) From Servera To Serverb.

And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)

2 years ago

pipeline, can I control the tags that the tasks a pipeline creates?

add_pipeline_tags

adds tags from pipeline to the tasks I suppose? But I also need to clear existing tags in those created tasks

add_pipeline_tags will add the unique ID of the pipeline execution, if you want to add specific tags you can use the task_overrides and provide:
pipe.add_step(..., task_overrides={'tags': ['my', 'tags']})

3 years ago

0 Hey, I Have A Problem With The Following Task:

JitteryCoyote63 in the UI what's the value of "config" ? Is it empty, it a string?
Also, could you check if removing the 'type=str' from the add_argument changes the behavior?

4 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...

3 years ago

0 Hey, I Have A Problem With The Following Task:

The cloning is done in another task, which has the argv parameters I want the cloned task to inherit from

JitteryCoyote63 What do you mean by that?

Hmmm, make sure the task doing the cloning is using 0.16.1 and above , because with .16 we added sections and the compatibility is between the version. Meaning if you have tasks generated with trains .16 you need trains .16 to clone them from code (so you could properly control the arguments)

4 years ago

0 Hi, I Have A Problem With "Dataset" Module. I Create Dataset And Uploaded Few Files:

👍 no worries

2 years ago

0 Is There An Efficient Way To Query All Unique Models (Ie Excluding Versions) In A Project?

What do you mean? every Model has a unique ID, what do you consider a version?

3 years ago

Nice!

2 years ago

0 Hi, Guys! I’M Trying To Connect Clearml To My Task And Getting Strange Error: After

Hmm let me rerun (offline mode right ?)

3 years ago

VictoriousPenguin97 I'm assuming the exact same server version ?

2 years ago

Show more results