AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

No worries, condatoolkit is not part of it. "trains-agent" will create a new clean venv for every experiment, and by default it will not inherit the system packages.
So basically I think you are "stuck" with the cuda drivers you have on the system

5 years ago

0 Hello, I Have A Local Install Using The Docker Compose Approach. I'M Trying To Set

Hi MistakenDragonfly51

I'm trying to set

default_output_uri

in

This should be set wither on your client side, or on the worker machine (running the clearml-agent).
Make sense ?

3 years ago

0 Anyone Doing Sagemaker With Clearml - Something Like The K8S Glue But The Tasks Are Pulled Into Sagemaker Training Jobs

That should not be complicated to implement. Basically you could run 'clearm-task execute --id taskid' as the sagemaker cmd. Can you manually launch it on sagemaker?

4 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

ConfusedPig65 could you send the full log (console) of this execution?

4 years ago

0 Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

JitteryCoyote63 Hmmm in theory, yes.
In practice you need to change this line:
https://github.com/allegroai/clearml/blob/fbbae0b8bc933fbbb9811faeabb9b6d9a0ea8d97/clearml/automation/aws_auto_scaler.py#L78
` python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 0 --detached

python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker} --gpus 1 --detached

python -m clearml_agent --config-file '/root/clearml.conf' d...

4 years ago

0 Hi Clearml Community. I Interviewed Nir Bar-Lev On The Practical Ai Podcast, So I Had Allegro/Clearml In The Back On My Mind. I’M Launching A New Project At My Org Now, And I Think Clearml Might Be A Good Fit. Questions That Have Come Up Are:

Hi GleamingGrasshopper63

How well can the ML Ops component handle job queuing on a multi-GPU server

This is fully supported 🙂
You can think of queues as a way to simplify resources for users (you can do more than that,but let's start simple)
Basicalli qou can create a queue per type of GPU, for example a list of queues could be: on_prem_1gpu, on_prem_2gpus, ..., ec2_t4, ec2_v100
Then when you spin the agents, per type of machine you attach the agent to the "correct" queue.

Int...

4 years ago

0 I Updated Trains-Server Today, And Now It'S Very Unstable, Web Interface Randomly Stops Working. Anyone Had The Same Problem? I'Ve Never Had Any Problems With Updating The Server Before

Hi DilapidatedDucks58 ,
I'm not aware of anything of this nature, but I'd like to get a bit more information so we could check it.
Could you send the web-server logs ? either from the docker or the browser itself.

5 years ago

0 Hi All! Are There Any Plans To Add Scatterplots To Visualize E.G. Hyperparemeter X Accuracy Comparisons Between Experiments? Mlflow Does This In A Really Nice Way, And I Missed This Feature On Our Transition To Clearml:

🙏 thank you so much @<1556450111259676672:profile|PlainSeaurchin97> !!!

2 years ago

0 Hii Everyone! I'M Having An Issue Using An Agent Without A Gpu. I'M Using It On Docker Mode (To Allow Ssh), I Changed The Default Docker Image On The Config File To Python 3.9.6 But It Seems It Is Still Trying To Use The Nvidia Image. The Error Message G

Hi GrotesqueOctopus42 ,

BTW: is it better to post the long error message on a reply to avoid polluting the channel?

Yes, that is appreciated 🙂
Basically logs in the thread of the initial message.

To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)

Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!

2 years ago

0 Wondering Why This Is The Case When Deploying The Clearml Server Locally

Open source defaults 😊

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?

4 years ago

0 Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?

5 years ago

0 For The Frameworks Which Are Supported In Built, Trains Stores The Trained Model As Output Model E.G. For Xgboost Here

PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py

5 years ago

0 In Ui Under Execution Tab, I See That The Trains Has

What's the "working directory" ?
What's the trains-agent version?
(yes this should have worked, as long as the package "test" is there)

5 years ago

0 Wondering Why This Is The Case When Deploying The Clearml Server Locally

Sees like a Flask warning:
https://stackoverflow.com/questions/51025893/flask-at-first-run-do-not-use-the-development-server-in-a-production-environmen

3 years ago

0 If I Create A Task Using Task.Create And Then In A Separate Piece Of Code I Want To Report To It (By Using

now it stopped working locally as well

At least this is consistent 🙂
How so ? Is the "main" Task still running ?

3 years ago

0 Hi Folks, We Are Trying To Find A Tool To Help With Workflow Orchestration. This Is Our Stack So Far (Label Studio/Clearml/Seldon). Does Anyone Have Any Experience With Using Any Workflow Which Is Most Compatible Esp Wrt To Clearml.

Still figuring out, what is the best orchestration tool,which can run this end-2-end.

DeliciousBluewhale87 / PleasantGiraffe85 based on the scenario above what is the missing step that you need to cover? Is it the UI presenting the entire workflow? Or maybe the a start trigger that can be configured ?

4 years ago

0 Hi I Came Across Some Inconsistency In The Iteration Reporting In The Clearml With Pytorch-Lightning When Calling Trainer.Fit Multiple Times, Before I Dive In I Wondered If There Is A Known Issue Related To This?

but the debug samples and monitored performance metric show a different count

Hmm could you expand on what you are getting, and what you are expecting to get

4 years ago

0 Hi, I Run The Trains Server In An Docker Container And Started Making Use Of Tasks ... My Tests Are Showed On The Projects Dashboard Which Is Realy Cool. What I Haven'T Found So Far Is A Way To Clean Up The System From The Tests I Did. I'M Able To Archive

models been trained stored ...

mongodb will store url links, the upload itself is controlled via the "output_uri" argument to the Task
If None is provided, the Trains log the local stored model (i.e. link to where you stored your model), if you provide one, Trains will automatically upload the model (into a new subfolder) and store the link to that subfolder.

how can I enable the tensorboard and have the graphs been stored in trains?

Basically if you call Task.init all your...

5 years ago

0 Hey - I'M Trying To Compare Voxel Versus Clear Ml In Image Data Exploration.

Yeah I think using voxel for forensics makes sense. What's your use case ?

2 years ago

0 Hi, I'M Having A Hard Time Trying To Understand The Dataset Class. What I Need Is To Be Able To Get The Dataset, Delete A File, And Upload It Again. But The Problem Is When I Call The

but I don't see any change...where is the link to the file removed from

In the meta data section, check the artifacts "state" object

How are these two datasets different?

Like comparing two experiments :)

4 years ago

0 Hello! Since Today I Get

Yes that is exactly what I will make sure we change :)

4 years ago

0 Hi There

5 years ago

0 Hi, Does Anyone Use Mlflow / Weight & Biases /

Wow, those guys....

5 years ago

0 Btw: There Seems To Be No Support For Videos In Tensorboard/Experiment View (E.G.

ReassuredTiger98 I think it is using moviepy for the encoding... No?

4 years ago

0 Hi, Another Question. I Tried To Not

BTW: 0.14.3 solved the issue you are referring to, so you can import trains before / parsing the args without an issue. Regrading passing project/name as parameters. A few thoughts: (1) you can always rename / move projects from the UI (2) If you are running it with trains-agent there is no meaning to these arguments, as by definition the Task was already created... Maybe we should give an option to exclude a few arguments from argparser, I think this topic came up a few times... What d...

5 years ago

0 Hi, We'Re Hosting Clearml On Our K8S Cluster, And I'M Running Into Problems With It... I'Ve Set It Up In A Subdomain Way - App/Files/Api.Clearml.Mydomain... But I Have Some Issues With The Ssl Certificate. When I Try Running

or point to the self signed certificate:
export REQUESTS_CA_BUNDLE=/path/to/your/certificate.pem

4 years ago

0 Probably My Question Will Be About Something Other Than Core Concepts Of Clearml, But Something Connected I Think.... I Want To Schedule Tasks On Gpu And I Want Many Agents To Use Given Gpu At One Time. I Also Want To Avoid Situation When Gpu Will Be Out

Hi RoundMosquito25
The main problem here is there is no way to know before running the Task how much memory it would need ... And without that parameter maximizing GPUs is quite challenging. wdyt?

2 years ago

0 Also, For Selecting A Subset Of Experiments To Compare, It Looks Like Neptune Currently Has A More Advanced Solution (

Also. finally the columns will be movable and re sizable, I can't wait for the next version ;)

5 years ago

Show more results