AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 I'M Running Hyperparameter Tuning With Oputnaotimization. When Using Optuna It Is Possible To Save Studies As You Go And Pick Them Up Again In Case Of Crashes Etc. Is There Anyway Of Accessing The Optuna.Study Class So When We Run The Optunaoptimization W

https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study

3 years ago

0 Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.

I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...

3 years ago

0 If I Update The Annotation Files For Some Dataset And Upload It. Can It Call The Previous Version Of Dataset Using The Dataset Id ?

What is the specific use case, updating a file on existing dataset and creating a new version?

4 years ago

0 Hi, I’M Trying To Create A Dataset On Clearml Server From My Aws S3 Bucket Via:

It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?

2 years ago

0 Hey All, Quick Question About Pipeline Execution Queues. I Set The

My bad, I worded my question wrong I see,

LOL no worries 🙂

Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113

Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...

2 years ago

0 Hi, Is A Remote Task Execution Wit Azure Devops Private Repository Working? I Am Hitting A Problem Where Clearml Is Creating A Wrong Url For Loading:

is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that

I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)

2 years ago

0 Hi, Clearml Stores Models In The Following Format:

Is it possible to change this format ?

not really the path itself is set to be unique.
That said you can upload the model manually with StorageManager.upload_file then register it with Model.import_model
None
None
wdyt?

2 years ago

0 Hello! Since Today I Get

I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?

4 years ago

0 Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

CrookedWalrus33 can you post the clearml.conf you have on the agent machine?

3 years ago

0 Is Clearml Able To Intercept (Automatically) Metrics Gathered Via

When you have a bit of experience, please suggest a path forward, it will be great to integrate

2 years ago

0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.

But it worked well when using StorageManager and uploading to the minio directly, is that correct?

.. I give my Minio to output_uri argument

How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)

5 years ago

0 Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

Also in the same open docker session, can you try:
$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>Where the Task ID is one of the failed executions (only reset it before)

4 years ago

0 Is There A Way To Get Tasks By Hyperparameters Values? When I Use The Search In The Ui I Get The Relevant Task, But When I Try The Following I Get An Empty List:

Guys FYI:
params = task.get_parameters_as_dict()

5 years ago

0 Hi, Where Can I Find The Full Spec/Syntax For

Hi CourageousDove78
Not the cleanest, but you can basically pass everything here:
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html#post--tasks.get_all
Reasoning is that it is passed almost as is to the server for the actual query.

4 years ago

0 How Can I Integrate Trains-Server To Aws Ec2 Api

Sure :) https://allegro.ai/docs/examples/services/aws_autoscaler/aws_autoscaler/

5 years ago

0 Hi, Together With

JitteryCoyote63 How is it so far ?

5 years ago

0 Hey All, Hope You’Re All Doing Well. I’M Running A Self-Deployed Server (0.17, I Think, Where Can You Find The Version In Use?). I’M Having Trouble With The Automatic Plot Capture. If I Run

Could you test if this is working:
https://github.com/allegroai/clearml/blob/master/examples/reporting/matplotlib_manual_reporting.py

4 years ago

0 For Some Runs Of My Experiments The Ressource Monitoring Exists, For Other It Does Not. Any Idea Why This Could Be The Case?

make sense ?

4 years ago

0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

JitteryCoyote63 What did you have in mind?

5 years ago

0 Anyone Using Trains With Snakemake? I Am Running My Workflow With Snakemake In A Docker Container, And It Can Output To The Trains Server Of Course, But Executing A Task From Trains Ui Tries To Run The Script In Its Own Container... It Downloads An Ubuntu

Hi BroadMole98

What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?

Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...

5 years ago

0 Why Am I Getting A 403 From File Server When The K8 Glue Agent Is Initializing ?

I don't see any requests

This points to configuration, specifically maybe it is directed to a different server?!

3 years ago

0 Hi Community

Great to hear!

2 years ago

0 Hey All, Hope You’Re All Doing Well. I’M Running A Self-Deployed Server (0.17, I Think, Where Can You Find The Version In Use?). I’M Having Trouble With The Automatic Plot Capture. If I Run

I think we were able to fix it, let me check if it was pushed 🙂

4 years ago

0 Whet Is The Method For Packages Exploration When Using Conda? Agent Is Set To 'Conda' Mode. We Upload A Task From A Local Conda Env That (Obviously) Has Some Pip Packages As Well. When We Enqueue The Task To Run Remotely, Not All Conda Packages Are Instal

It's in my local conda environment though.

Meaning this is a wheel installed manually in conda? or is it a folder inside the conda environment ?

3 years ago

0 Hello Everyone! I'M Currently Trying To Set Up A Pipeline, And Am A Bit Confused At A Few Things. Some Questions I Have:

SteadySeagull18 btw: in post-callback the node.job will be completed
because it is a called after the Task is completed

2 years ago

0 Hey Guys, I'M Experiencing Seemingly Random Problems With The Experiments. There Are 4 Gpus And 8 Workers (2 Workers Per Gpu) , And Sometimes Experiments Randomly Fail (Or Complete) In The Middle Of The Epoch Without Any Additional Info In The Logs. What

Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?

5 years ago

0 Hi There Trains Riders, Is There A Built-In Way To Send Notifications Upon Completed/Failed Experiment? I Have Seen The Slack_Alerts Code Sample, Where The Monitor Is Implemented By Code. Nice. My Question Is About Existing Monitors In The Trains-Server (

ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:
curl -o docker-compose.yml

5 years ago

0 Hi, I'M Getting Error 404 When Trying To See Debug Samples I'Ve Recorded With Record_Image. The Local Path I'Ve Provided Is Valid (Image Is Displayed Normally When I Read It Via Python For Example) But Trains Ui Tell Me In The Debug Samples "Unable To Loa

What's the trains-server version ?
You can see it if you go to the profile page

4 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

Please hit Ctrl-F5 refresh the entire page, see if it is till empty....

5 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

NICE!

3 years ago

Show more results