AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Questions 48
Answers 8051

0 Votes

0 Answers

993 Views

0 Votes 0 Answers 993 Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

4 years ago

0 Votes

3 Answers

413 Views

0 Votes 3 Answers 413 Views

These Are Xgboost Internal Metrics That Are Automatically Picked By Clearml

@<1523703325881536512:profile|ConvolutedSealion94> these are xgboost internal metrics that are automatically picked by clearml

xgboost

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello Everyone!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Finally

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

docs are up

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

:confetti_ball: :champagne: Happy new year <!everyone>! :fireworks: :sparkler: We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to see users enjoying the product you build, and y

🎊 🍾 Happy new year ! 🎆 🎇 We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to ...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

I Would Guess Connectivity Issues, The Tls Is Probably Python Inaccurate Response (I Mean In A Way, It Is Also A Tls Error, But I Would Imagine This Has More To Do With The Actual Network Connection)

I would guess connectivity issues, the TLS is probably python inaccurate response (I mean in a way, it is also a TLS error, but I would imagine this has more...

clearml

4 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

1 Answers

989 Views

0 Votes 1 Answers 989 Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

2 years ago

0 Votes

3 Answers

525 Views

0 Votes 3 Answers 525 Views

We Recently Released A New Version Of

we recently released a new version of clearml-session with Persistent Workspace support! 🚀 🎉 Finally you can develop on remote machines with workspace fold...

remote-ssh

7 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Lol, I Wonder What The Adblock Rule Was ;)

Lol, I wonder what the adblock rule was ;)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Well To Be Honest, We Kind Of Thought It'S Redundant. Basically Storing Artifacts In Experiments And Having Them Retrieved Quickly From The Code Itself Was Way More Convenient For Us Then To Manually Have To Do Clone/Pull Of The Data... Example: Create Da

Well to be honest, we kind of thought it's redundant. Basically storing artifacts in experiments and having them retrieved quickly from the code itself was w...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

YEY!!!! *Download as CSV* :exploding_head:

YEY!!!! Download as CSV 🤯

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

We Are At Aaai Ny, Come Look Us Up :)

We are at AAAI NY, come look us up :)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Slack Security ... Go Figure

Slack security ... Go figure 😉

clearml

4 years ago

0 Votes

7 Answers

453 Views

0 Votes 7 Answers 453 Views

Thank You All For Taking The Time To Answer Our Survey (If You Haven'T Already, We Urge You To

Thank you all for taking the time to answer our survey (If you haven't already, we urge you to do so ). Your feedback has a major impact on what we build, do...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi ClearML v0.17.1 and ClearML-Agent v0.17.0 are now the official packages & repositories 🎉 🎊 👋 🛤️ This new name brings on many changes, mainly replace a...

clearml

3 years ago

0 Votes

2 Answers

994 Views

0 Votes 2 Answers 994 Views

Hi ! trains 0.16.2 is finally out with the new pipelines interface! Check out the new example https://github.com/allegroai/trains/blob/master/examples/pipeli...

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

This Is Usually Due To Enterprise Level Issued Https Certificates Not Part Of The Local Installation (Basically Any Python Generated Ssl Request Will Fail)

This is usually due to enterprise level issued https certificates not part of the local installation (basically any python generated SSL request will fail)

clearml

4 years ago

0 Votes

0 Answers

991 Views

0 Votes 0 Answers 991 Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

4 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

This Will Close It

This will close it Task.current_task().close()I think we should rename completed() because it just marks the Task as completed on the backend but does not ac...

clearml

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi , v0.15 is out, 🎉 🚀 Your feedback had a major influence on the features we added 🙂 thank you! A selected list of features: Column resizing / ordering /...

clearml

4 years ago

0 Votes

10 Answers

499 Views

0 Votes 10 Answers 499 Views

Happy Friday Everyone ! We Have A New Repo Release We Would Love To Get Your Feedback On

Happy Friday everyone ! We have a new repo release we would love to get your feedback on 🚀 🎉 Finally easy FRACTIONAL GPU on any NVIDIA GPU 🎊 Run our nvidi...

clearml

8 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Apparently Everyone Can ...

apparently everyone can ...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi ! ClearML Server + SDK v1.9.0 is out! 🎉 🚀 🎊 Happy Holidays and Happy New Year! ❇️ 🎇 🎄

clearml

one year ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is It A One Time Thing? Or Recurring?

Is it a one time thing? or recurring?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<https://allegro.ai/docs>

https://allegro.ai/docs

clearml

4 years ago

Show more results

0 Hi, We Have A Clearml Agent Running On An Ec2 Machine. Until Now It Worked Great. We Are Two Team Members Using It Without A Problem Via The Saas Ui. Starting This Week, While I'M Myself Is Able To Use It As Before, The Other Team Member Gets This Error B

Hi CurvedDolphin95
I would first check the free space on the instance (it might be that git is reporting an inaccurate error and it's free space not permission that causing it to fail the clone).
I would also check your GitHub account, notice that the now only support user/api-key (and not user/pass), which means you need to create an api-key and add it as your password in the clearml.conf.
Any chance that for some reason some of the Tasks are running from a diff user? or not using a docker ?

3 years ago

0 Can We Use Dynamodb With Clearml Helm Charts Instead Of Mongodb? We'D Like To Move All Stateful Storage To Aws As A Separate Service And That Would Be A Nice Alternative

Hi UnevenDolphin73
In theory it "might" work, I have to admit that personally I'm not a fan of what Amazon did to Mongo, i.e. forking their their code base and selling it as a service, just bad open-source practice
(The main issue might be API calls that might not fully match)
wdyt?

2 years ago

0 Hi Everyone, I’M Getting An Error During Model Upload To S3. The Error Shows Up In The Console Like Below And I Don’T See Any Uploaded Objects In S3:

Hi ScantChimpanzee51
btw: this seems like an S3 internal error
https://github.com/boto/s3transfer/issues/197

2 years ago

0 Hi Everyone, I’M Getting An Error During Model Upload To S3. The Error Shows Up In The Console Like Below And I Don’T See Any Uploaded Objects In S3:

Out of curiosity, if Task flush worked, when did you get the error, at the end of the process ?

2 years ago

0 Hi Everyone, I’M Getting An Error During Model Upload To S3. The Error Shows Up In The Console Like Below And I Don’T See Any Uploaded Objects In S3:

So without the flush I got the error apparently at the very end of the script -

Yes... it's a python thing, background threads might get killed in random order, so that when one needs a background thread that died you get this error, which basically should mean you need to do the work in the calling thread.
This actually explains why calling Flush solved the issue.
Nice!

2 years ago

0 When We Train The Models, We Often Choose Checkpoint Based On The Validation Accuracy, But Test Set Accuracy (Or Specific Class Validation Accuracy) Is Not Necessarily The Best For This Checkpoint. Right Now There Are Options To Add Columns With Max And L

Hi DilapidatedDucks58

eg, we want max validation accuracy and all other metric values for the corresponding epoch

Is this the equivalent of nested sort ?
Wouldn't you get the requested behavior if you add all metric columns but sort based on the "accuracy" column ?

3 years ago

Okay, I think I lost you...
DilapidatedDucks58 you mean detect at which "iteration" the max value was reported, and then extract all the other metrics for that iteration ?

3 years ago

0 Hello Guys! I Have A Little Question: My Metrics Quota Has Been Reached, And I Cannot Use The Platform Anymore.. I Have Already Removed Almost All Projects Some Days Ago But It Still Says The Same. Is There Anything I Can Do To Get Some Quota Again?

NastySeahorse61 it might that the frequency it tests the metric storage is only once a day (or maybe half a day), let me see if I can ask around
(just making sure you can still login to the platform?)

2 years ago

0 I Assume I Can Ask A Question Here. The Clearml Orchestrator Looks Interesting. But The Website Suggests That K8S Is Required. We Have A Linux Training Box (Lambdabox) Where We Want To Run Training. Can We Place The Clearml Orchestrator Agent On The M

My pleasure 😊

2 years ago

0 Hello All, How Can I Access The Restful Api. Any Docs Available?

JuicyDog96 Yes please!
Let me check what's the status with the docs repository, and I'll get back to you soon 🙂

4 years ago

0 Hi

Awesome! any way to hear the talk w/o/ registering for the whole conference?

CloudySwallow27 Anyway we will make sure we upload the talk to the clearml youtube channel after the Talk

2 years ago

0 When Launching A Task To Trains Agent, I'M Having Trouble Getting The Imports From Other Files Working Correctly. For Instance, If My Task Imports A Function From Another File Within The Same Git Repo [

. So to conclude: it has to be executed manually first, then with trains agent?

Yes, that said, as you mentioned, you can always edit the "installed packages" once manually, from that point you are basically cloning the experiment, including the "installed packages" so it should work if the original worked.
Make sense ?

4 years ago

0 Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?

one year ago

0 Hey! Does Anyone Know If I Can Use Different Ports For My Clearml Ui Server?

This depends on how you spined the server, basically as long as you configure the clients (i.e. python clients) correctly, there is no issue.
But the auto generated configuration might be off (in the UI when you credentials it tells the clearml-init where the server is and the ports)
I would actually recommend subdomains if this is possible
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#sub-domain-configuration
wdyt?

2 years ago

0 Hi Folks, A Question Regarding The Clearml-Agent With K8S Glue. In The Agents We Mount An Nfs Volume So That Some Artifacts And Data Would Be Available For Training. I Have Seen That The K8S Glue Runs As Root (I Guess To Be Able To Spawn New Pods?), But

For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk

Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell 🙂

Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI

2 years ago

DilapidatedDucks58 I see ...
This might be more complicated that one would imagine, a simple solution might be to store a snapshot of the values every-time we reach a new maximum, a quick hack might be to add it as text on one of the task's parameters or properties (that we can later add to the table as custom column).
wdyt?

3 years ago

0 Wondering Why This Is The Case When Deploying The Clearml Server Locally

Sees like a Flask warning:
https://stackoverflow.com/questions/51025893/flask-at-first-run-do-not-use-the-development-server-in-a-production-environmen

2 years ago

0 Hello, I Am New To Clearml, I Would Like To Learn More About How Clearml Works On A Hpc Cluster Where The Only Way To Get Computational Resources Is Via Slurm:

That should work 🙂
BTW, you might play around with "clearml-agent execute --id <task_id_here>"
This will basically clone the code, create a venv with the python packages, apply uncommitted changes and will run the actual code. This could be a replacement for your bash. (notice it means that you need to clone the Task in the UI, then you can Change parameters, then the run the agent manually in SLURM and it will take the params from the UI.)

3 years ago

0 Hey There, Would It Be Possible To Make Clearml-Agents Support Both Docker Mode And Venv Mode At The Same Time? Ie. Not Requiring To Be Restarted To Switch The Mode. The Mode Should Be Define On The Task Level: I Start An Experiment And Define Whether It

I guess it’s on me to check whether this slowdown is negligible or not

Usually performance is negligible, especially with GPU
But if you really want the best:
Add --security-opt seccomp=unconfined to the extra_docker_arguments
See detials:
https://betterprogramming.pub/faster-python-in-docker-d1a71a9b9917

2 years ago

0 What Is The Right Way To Increase Number Of Retries When Using

DilapidatedDucks58 I think they are used here:
https://github.com/allegroai/clearml/blob/3d3a835435cc2f01ff19fe0a58a8d7db10fd2de2/clearml/storage/helper.py#L1407

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.resource

2 years ago

0 Unrelated Problem (Or Is It?) The Clearml'S Built In Cleanup Service Fails

. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery

Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS and clearml.conf contains no "project" section it crashed when starting ?

2 years ago

0 Hello Everyone! I'M Using S3 For My Model Saving. During Hyperparameter Optimization My New Tasks Get Very Long Names Due To Override Parameters And Uploading Path Becomes Something Like This "/Traffic Lights Classification/

This one?
https://stackoverflow.com/questions/6870824/what-is-the-maximum-length-of-a-filename-in-s3

2 years ago

0 Hi Everyone, Is There A Way To Increase The Cache Size Of Each Clearml Task? I'M Running An Experiment And Many Artifacts Are Downloaded. My Dataloader Fails To Load Some Of The File Since They Are Missing, Although They Were Downloaded. I Guess There Is

ScaryKoala63 nice!!!!!

2 years ago

Is this consistent on the same file? can you provide a code snippet to reproduce (or understand the flow) ?
Could it be two machines are accessing the same cache folder ?

2 years ago

ScaryKoala63
When it fails what's the number of files you have in:
/home/developer/.clearml/cache/storage_manager/global/ ?

2 years ago

Are you suggesting the conf file did not set the default size? It sounds like a bug, can you verify?

2 years ago

And when retrieve just this file? is it working ?
(Maybe for some reason the file is corrupted) ?

2 years ago

Hi ScaryKoala63
Sure, add the following to your clearml.conf:
sdk.storage.cache.default_cache_manager_size = 400I think you are correct, it seems like for some reason you hit the cache limit, and a previous entry was deleted

2 years ago

0 Hi. Looking Into Clearml Support For Datasets, I'D Like To Understand How To Work With Large Datasets And Cases Where Not All The Data Is Downloaded At Once. (E.G. 1. Each Training Epoch Is Performed On A (Preferably Random) Sample Of The Data That Is Dow

PanickyMoth78

Is it limited to

accounts? (

unfortunately, yes 😊 , but I'm sure sales will be able to hook you up ...

2 years ago

0 If The Trains-Server Stops Responding, Would Any Running Experiment Keep A Cache Of To-Be-Sent-Data, Fail The Experiment, Or Continue The Run, Skipping The Recordings Until The Server Is Back Up?

Hi TrickyRaccoon92

... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?

Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)

Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?

You mean artifacts or the database ?

4 years ago

Show more results