AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

JitteryCoyote63 so now everything works as expected ?

4 years ago

0 Hi, Is It Possible To Query All Experiments In A Project And Get The Best Performing One (Sorted By One Metric)? Something Similar As Search_Runs In Mlflow (

Will they get ordered ascending or descending?

Good point, I'll check the docs... but I think they do not specify
https://clear.ml/docs/latst/docs/references/sdk/task#taskget_tasks
From the code it seems the ordered is not guaranteed.
You can however pass '-last_update' : order_by which will give you the latest updated first
` task_filter = {
'page_size': 2,
'page': 0,
'order_by': ['last_metrics.{}.{}'.format(title, series), '-last_update']
}
Task.get_tasks(...

4 years ago

0 Hi Team, How To Configure Gerrit To Clearml?

I'm answering on the original thread

2 years ago

0 Does Anyone Have Experience With Integrating Clearml And Slurm? If So, What Pattern Did You Use? (Did You Submit Tasks And Just Use Clearml As Tracker, Or Did You Start Agents With Slurm?) Would Love To Hear From The Community Before Trying To Diy

im helping train my friend

on clearml to assist with his astrophysics research,

if that's the case, what you can do is use the agent inside your sbatch script,
(full open source). This means the sbatch becomes " clearml-agent execute --id <task_id_here> " this will set up the environment and monitor the job and still allow you to launch it from slurm, wdyt?

7 months ago

0 Hi, We Have A Use Case That We Would Like To Upload A Local Folder Into The Cloud

OutrageousSheep60 so this should work, no?
ds.upload(output_url='gs://<BUCKET>/', compression=0, chunk_size=100000000000)Notice the chunk size is the maximum size (in bytes) per chunk, so it should basically very large

2 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

http vs https ?

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

this issue on when trying to set up on our remote machines

You mean setting up the trains-server on remote machine?

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

that might be it.
Is the web UI working properly ?
What ports are you using?

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

are you using http?

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

okay so the error should have been:

trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://<IP>:8008 ?

Not https nor 8010 ?!

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

curl seems okay, but this is odd https://<IP>:8010
it should be http://<IP>:8008
Could you change and test?
(meaning change the trains.conf and run trains-agent list )

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

the only port configurations that will work are 8080 / 8008 / 8081

4 years ago

0 Hi Great Trains Community! I Have A Question Regarding Version Control. How Trains Manages Model/Dataset Version Control?

Thank you MuddyCrab47 !
Regrading model versioning:
All models are logged automatically by trains (no need so specify it, as long as you are using one of the automagically connected frameworks: PyTorch/keras/TF/SKlearn)
You can see see how it looks like on the demoapp:
https://demoapp.trains.allegro.ai/projects/5371015f43f043b1b4ad7203c1ff4a95/models

Regrading Dataset management, we have a simple workflow demonstrated below, bascially using artifacts as dataset storage, with very easy int...

5 years ago

0 Hi, Is It Possible To Query All Experiments In A Project And Get The Best Performing One (Sorted By One Metric)? Something Similar As Search_Runs In Mlflow (

👍

4 years ago

0 Hi There, I Used

JitteryCoyote63 no you should not (unless you already have the Task.init call in your code)
clearml-data add the Task.init call at the beginning of the code in the entry point.
This means you should be able to get Task.current_task() and get back the object.
What do you have under the "uncommitted changes" on the Task that was created?

UnevenDolphin73 clearml.config.get_remote_task_id() will return the Task ID not the Task object. in order to get automagic to work, one h...

3 years ago

0 Hi, I Have A Question About How To Get Notified When A Task Finishes Running. We Are Creating Training Tasks Programatically On Clearml Server. It Is Relatively Easy To Do Through Directly Accessing The Clearml Server Api. However, We'D Like Some Way T

Hi JoyousElephant80

Another possibility would be to run a process somewhere that periodically polls ClearML Server for tasks that have recently finished

this is the easiest way to implement what you are after, and have full control over the logic itself.
Basically you inherit from the Monitor class
And implement the callback function:
https://github.com/allegroa...

4 years ago

0 Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

I believe a process is still running in the background. Is it expected? (v0.17.4)

Yes it is expected.
Basically it reports that the resource monitoring did not detect any "iterations"/"steps" reporting, so instead of reporting resources based on iterations it reports based on time. Make sense ?

4 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario

5 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

JitteryCoyote63 are you suggesting it happens ?
(obviously it should not 🙂 )

5 years ago

0 Hi Again, I Tried To Upgrade Trains Package To 15.1 From 13.1 That I Was Using For A While.. After The Upgrade My Code Stuck When Trying To Use "Pool" (From Multiprocessing Import Pool) The Code Snip:

Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)

5 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

containing the

Extension

module

Not sure I follow, what is the Extension module ? what were you running manually that is not just pip install /opt/keras-hannd ?

2 years ago

0 Hello! Does Anyone Know How To Do

Hi @<1603198134261911552:profile|ColossalReindeer77>

Hello! does anyone know how to do

HPO

when your parameters are in a

Hydra

Basically hydra parameters are overridden with "Hydra/param"
(this is equivalent to the "override" option of hydra in CLI)

2 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week

5 years ago

0 Hi All, I Am Trying To Deploy Clearml Server In A Local Machine. I Followed All The Steps In

nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done

one year ago

0 Hi

p.s. clearml v0.17.1 is out, fixing the missing link to clearml-task 😥

4 years ago

0 I Have A Local Folder A, And A Dataset B. A:

Oh maybe:
cp -R b a/

3 years ago

0 I Wanted To Ask, How To Run Pipeline Steps Conditionally? E.G If Step Returns A Specific Value, Exit The Pipeline Or Run Another Step Instead Of The Sequential Step

Hi VexedCat68
(sorry I just saw the message)

I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step

So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False

pipe.add_step(name='...

3 years ago

0 I Know I Can Run This Manually In Step By Step But Wondering If This Can Be Automated As Scheduled Tasks

DAG which get scheduled at given interval and

Yes exactly what will be part of the next iteration of the controller/service

an example achieving what i propose would be greatly helpful

Would this help?
from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()

5 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?

4 years ago

0 Hi, I Am Trying To Use Agent With A Sample, Very Simple Task. But It Stucks And Task Does Not Finish. In Ui In Console I See What I Pasted On Image. Do You Know What I Might Be Doing Wrong? Agent Is Run In Virtual Env Mode

Its stored on the Task, you can see it under the execution tab in the UI

2 years ago

Show more results