AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hey Everyone, In Case Anyone Is Interested, I Created A Utility Script For Making Backup Snapshots Of A Local Clearml Server Without Server Shutdowns. My Team Is Working With Large Datasets And Long Running Tasks Which Makes Periodic Server Shutdowns Real

Thanks @<1547028074090991616:profile|ShaggySwan64> !!
Passing to the backend guys to take a look

3 months ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

BTW: how did it get there ?

4 years ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

PungentLouse55 , make sure you fix the metric objective and args:
Add "General/" prefix to the list of arguments to optimize, and change the objective metric from "Accuracy" to "epoch_accuracy"

4 years ago

0 Hi Community! I Have Difficulty Using Clearml Pipeline. I Am Writing The Code Using The Pipeline Decorator, But The Pipeline Does Not Work With The Following Error When Specifying The Docker Image As A Argument Of The Decorator. How Should I Solve It?

Just to make sure, the first two steps are working ?
Maybe it has to do with the fact the "training" step specifies a docker image, could you try to remove it and check?
BTW: A few pointers
The return_values is used to specify multiple returned objects stored individually, not the type of the object. If there is a single object, no need to specify
The parents argument is optional, the pipeline components optimizes execution based on inputs, for example in your code, all pipeline comp...

one year ago

0 Web Server Ui Bug? When Trying To Extend The Width Of A Column In The Experiments Table, If You Try To Extend It More Then The Width Of The Column To The Right, It Doesn'T Do Anything..

Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?

3 years ago

0 Clearml Team Is No Longer To Develp Clearml-Session..? I Wrote An Issue But Nobody Answer

looks like a great idea, I'll make sure to pass it along and that someone reply 🙂

2 years ago

0 Clearml Team Is No Longer To Develp Clearml-Session..? I Wrote An Issue But Nobody Answer

Could you extend on the use case of #18 ? how would you use it? what problem will it be solving ?

2 years ago

0 Clearml Team Is No Longer To Develp Clearml-Session..? I Wrote An Issue But Nobody Answer

and then?

The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays

But you can query the Task it creates, the project is known, the user is known and it is of special type/tag

2 years ago

0 Hey, Would It Possible To Add An Option To Make

Hi JitteryCoyote63 ,
When you shutdown the task (manually with close() or when the process finish) it wait for the uploads...

Why do you need to specifically wait for all the artifacts upload? (currently you can stop the artifacts upload thread and wait for all the artifacts, but that seems like a bad hack)

5 years ago

0 I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).

How so?

4 years ago

0 Hi All

ShallowCat10 try something similar to this one, due notice that it might take a while to get all the task objects, so I would start with a single one 🙂

`
from trains import Task
tasks = Task.get_tasks(project_name='my_project')
for task in tasks:
scalars = task.get_reported_scalars()
for x, y in zip(scalars['title']['original_series']['x'], scalars['title']['original_series']['y']):
task.get_logger().report_scalar(title='title', series='new_series', value=y, iteration=...

4 years ago

0 Hi All! I Might Have Found An Issue With The Migration Guide.

is it possible to change an existing model's URL?

Edit the DBs ... That's basically the only way 😞

2 years ago

0 Hi, Can We Upload Our Project Repository To Trains Server? If We Can, How Should We Do? I Know When We Write "Task.Init()", It Uploads Our Experiment Into Server, But It Also Run The Experiment. However, I Want To Upload All My Experiments In Draft Status

MysteriousBee56 I see...
So yes, you can with the APIClient you have full RESTful access to the backend.
I think there was a similar discussion https://allegroai-trains.slack.com/archives/CTK20V944/p1593524144116300
HandsomeCrow5 how did you end up solving it? I think you had a similar use case?!

5 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

pip install clearml==0.17.5rc4

4 years ago

0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)

5 years ago

0 Hi Folks, I Have A Question Related To The Storage Of Artifacts, As It Is Not Entirely Clear To Me Where To Configure It. If I Read The Documentation

but DS in order for models to be uploaded,
you still have to set:

output_uri=True

in the

No, if you set the default_output_uri, there is no need to pass output_uri=True in the Task.init() 🙂
It is basically setting it for you, make sense ?

3 years ago

0 How To Use

I understand but how do you launch the cleaml-agent itself:
clearml-agent daemon --detached --queue default --docker

3 years ago

0 One More Thing, I'M Trying To Take Full Advantage Of The Controller, But I Run Into A Problem In My Use Case. The Controller Is Super Useful For Creating A Dag Of Tasks Which Is A Behaviour Of Interest. But Issues Rise When The Tasks Are Changing. Not On

Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...

4 years ago

0 Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

Where is darts reporting scalars ?

2 years ago

0 Are The Various Task Types Available In 0.15? I Am Getting

Yes, it should be fixed

5 years ago

0 Hi! I Was Wondering Regarding This Issue:

Hmm let me check ...

3 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

Thank you!

one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can

That might be it! let me check the code again...

3 years ago

0 Hi, Expanding On

Thanks DeliciousBluewhale87 ! greatly appreciated 🙂

4 years ago

0 Hi, Expanding On

Is it possible in Clearml to somehow allocate resources so that maybe after running a number of Alice's tasks, Bob's task get processed (Like maybe Round robin fashion)

Hi DeliciousBluewhale87
A few options here:
set the agent with high / low priority queues. Make sure Alice pushes into low priority (aka HPO) then Bob can push into high priority when he needs. This makes a lot of sense when you have automation processes spinning many experiments. expanding (1) you could set differe...

4 years ago

0 Crazy Idea:

I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)

one year ago

0 Hi, Expanding On

DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)
python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10

4 years ago

0 Hi, I'Ve Got A Quick Question About

. Does

Task.connect

send each element of the dictionary as a separate api request? Has anyone else encountered this issue?

Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?

3 years ago

0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

ohh sorry, no

5 years ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

Hi PungentLouse55

Hope you are not tired of me

Lol 🙂 No worries

I am using trains 0.16.1

Are you referring to the trains-server version or the python package ? (they are not the same and can be of totally different versions)

4 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

Hi HealthyStarfish45
Funny just today I had a similar discussion on slurm:
https://allegroai-trains.slack.com/archives/CTK20V944/p1603794531453000

Anyhow, when you say "[scale up agents]" are you referring to a machine constantly running an agent pulling jobs from the queue, where the machine itself (aka the resource) is managed as a slurm job?

4 years ago

Show more results