AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 What Is Being Stored Exactly In

Ohh... I would not delete them then ... 😞
Maybe kind of heuristics (files created a week ago can be deleted?!)

3 years ago

0 Is Trains Adaptable For Federated Learning Scenarios?

Hi LazyLeopard18 ,
So long story short, yes it does.
Longer version, to really accomplish full federated learning with control over data at "compute points" you need some data abstraction layer. Without data abstraction layer, federated learning is just averaging derivatives from different location, this can be easily done with any distributed learning framework, such as horovod pr pytorch distributed or TF distributed.
If what you are after is, can I launch multiple experiments with the sam...

5 years ago

0 Hi All! Is There Any Simple Way To Use

Hi @<1556450111259676672:profile|PlainSeaurchin97>

Is there any simple way to use

argparse

to pass a clearml task name?

need to call

args = task.connect(args)

.

noooo 🙂 there is no need to do that, the arguments are automatically detected
see for yourself

args = parse_args()
task = Task.init(task_name=args.task_name)

2 years ago

0 Hi Folks. I'Ve Installed Clearml On K8S Cluster Using Helm Chart 7.11.0, If It Matters. When I Trying To Create "App Credentials" From Workspace Settings And Then Past Them To Clearml-Init - I Got The Error:

Hi @<1729309120315527168:profile|ShallowLion60>
How did you create those credentials ?

one year ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

so g.processes is None?

5 years ago

0 Question About The Configuration Format - I'D Like To Parse It Within My Python Code So I'Ll Be Able To Access Things Like

Sorry my bad:
config_obj['sdk']['stuff']['here'] = value

4 years ago

0 Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Hi @<1690896098534625280:profile|NarrowWoodpecker99>

Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,

yes it does.

Are there configuration options available that allow us to control this behavior?

I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...

one year ago

0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:

2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads

And there's no sign of updates on the dashboard

Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.
task.flush() sleep(45)

4 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.

Back to the use case, instead ...

3 years ago

0 Is There Any Simple Way To Orchestrate A Batch To Train A Model With Different Features (In Order To Do Feature Selection, For Example) Through A Single .Py File? I Saw The Following Example

using only a subset of the features

ShallowGoldfish8 if you have some parameter that controls it (i.e. select different features) then you can launch it with two sets f parameters.
Am I missing something?
for example:
` my_features_select = {"type": "set_a"}
Task.current_task().connect(my_features_select)

if my_features_select["type"] == "set_a":

do something

else

do something else `wdyt?

3 years ago

0 Getting A Super Weird Error. Everything Works Fine On Local, When Trying To Run On Remote, Getting This Error Failing To Apply The Git Diff

WackyRabbit7 I'll make sure it is fixed

4 years ago

0 Thank You For Your Help So Far. I Have A Question About Trains Authentication And Privacy When Deploying On K8S. I Want Integrate Building A Trains-Server Into Our Iac. Now That I Got A Server To Work With An Agent Deployment Im Thinking About Authorizati

ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...

4 years ago

0 Hello, I Have A Problem With Task.Set_Initial_Iteration(0) In Google Colab. After Continuing The Experiment, Gaps Appear On My Graph, But If You Use Colab. I Tried It On My Computer And Everything Is Normal There.

Okay I think I know what's going on (there is a race that for some reason on CoLab acts differently).
As a quick hack you can do the following:
Task._report_subprocess_enabled = False task = Task.init(...) task.set_initial_iteration(0)

3 years ago

0 Hello, When I Create A Task On A New Server I Use, The Task Fails To Auto Detect The Working Directory And The Repository, As In The Attached Image. Consequently, I Cannot Run The Task In Clearml Agent, Getting "

Also I would suggest using Task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely

3 years ago

0 Hi, I Tried To Setup Clearml Serving And Ran The Example Given

Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase

3 years ago

0 I Would Like To Understand The Limitations Of

current task fetches the good Task

Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...

4 years ago

0 Pytorch-Lightning-Bols.Loggers.Trainslogger

YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160

5 years ago

0 Hi, Is It Possible To Migrate A Dataset From A Self Hosted Clearml Solution To The Clearml Hosted Solution?

I don't know how I would be able to get the description and name?

Good point, how about doing that in code, then you have all the information and you can store it in jsons / pickle next to the data folder?
wdyt?

2 years ago

🎊

3 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

Hi WickedGoat98

"Failed uploading to //:8081/files_server:"

Seems like the problem. what do you have defined as files_server in the trains.conf

4 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

Try:
task.update_requirements('\n'.join([".", ]))

4 years ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

ERROR: Could not install packages due to an EnvironmentError: 
[Errno 28] No space left on device

BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error

3 years ago

0 Hey, Can Anyone Please Explain To Me How The /Tmp/.Clearml_Agent.Something.Cfg File Is Generated Which Next Is Used In Docker? Because This File Is Slightly Different From Mine For Example In Mine /Home/Asa/Clearml.Conf I Set System_Site_Packages = False

Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?

4 years ago

0 Gals, Guys &

5 years ago

0 Hello Everone, I Have Hosted Clearml Server And Trained A Yolov8 Model To Test My Installations. The Model Was Trained Successfully And I Tried To Optimize The Hyderparameters By Using The Sample Code From Clearml But Im Getting Some Error In Doing So An

I assume issue: None
Yeah this is odd I noticed as well. Let me ask the guys to take a look

one year ago

0 Does Clearml Have A Testing Api? I'M Setting Up Stack To Enque Work With Clearml. Is There A Way I Can Simulate Queue And Worker Execution?

Hi @<1535069219354316800:profile|PerplexedRaccoon19>
What do you mean by simulate?
You can manually setup and run a Task if you need,
'clearml-agent execute --id task_id' add --docker for docker mode.
This will setup the env and run the task

2 years ago

0 I Have Used Aws S3 And Minio As Storage For Clearml Artifacts. But Has Anyone Used Nexus As A Storage ?

Quick update Nexus supports direct http upload, which means that as CostlyOstrich36 mentioned, just pointing to the Nexus http upload endpoint would work:
output_uri="http://<nexus>:<port>/repository/something/"See docs:
https://support.sonatype.com/hc/en-us/articles/115006744008-How-can-I-programmatically-upload-files-into-Nexus-3-

4 years ago

0 Hi, Expanding On

DeliciousBluewhale87 my apologies you are correct 😞
We should probably add support for that, do you feel like adding a GitHub issue, so we do not forget?

4 years ago

0 Hello Everyone! I'M Currently Trying To Set Up A Pipeline, And Am A Bit Confused At A Few Things. Some Questions I Have:

When I'm setting up my Pipeline, I can't go "here are some brand new tasks, please run them",

I think this is the main point. Can you create those Tasks via Task.create and get what you want? If so, then sure you can do that:
` def create_step_task(a_node):
task = Task.create(...)
return task

pipe.add_step(
name="stage_process",
parents=["stage_data"],
base_task_factor=create_step_task
) `wdyt?

As for the node, this confusing bit is that this is text from the docs...

2 years ago

0 Hello. I Have A Very Basic Question. I'M Still Exploring Clearml To See If It Fits Our Needs. I Have Taken A Look At The Webui, And I Am Confused About What Constitutes A Project. It Seems That A Project Is Composed By A Series Of Experiments And Models,

ShinyWhale52 any time 🙂
Feel free to followup with more questions

4 years ago

Show more results