
Reputation
Badges 1
13 × Eureka!Am I doing something differently from you?
As for experimenting, I'd say (and this community can be my witness 🙂 ) that managing your own experiments isn't a great idea. First, you have to maintain the infra (whatever it is, a tool your wrote yourself, or an excel sheet) which isn't fun and consumes time. From what I've heard, it usually takes at least 50% more time than what you initially think. And since there are so many tools out there that do it for free, then the only reason I can imagine of doing it on your own would be if y...
As for Kedro, first I'll say it's a great tool! If you use it and love it, keep on keeping on 🙂 I think these guys did a great job! While ClearML doesn't have all the features of Kedro (And obviously Kedro either, they are 2 different tools with 2 different goals), we do have the pipelining feature. The UI is still in the works and Kedro does look better! But if you use CleraML agent, you can probably build better automations. That said, if I had to "advise" I'd say start with ClearML pipe...
As for the git, I'm no git expert but having your own git server is doable. I can't tell you what it means in terms of how does it work in your organization though as every one has their own limitations and rules. And as I said, you can use SVN but the connection between it and ClearML won't be as good as with git.
We'll check this. I assume we don't catch the error somehow or the proccess doesn't indicate it died failing
Hi JumpyPig73 , I reproduced the OOM issue but for me it's failing. Are you handling the error in python somehow so the script exists gracefully? otherwise it looks like a regular python exception...
Yeah, it might be the cause...I had a script with OOM and it crashed regularly 🙂
Hi CourageousKoala93 , not 100% sure I understand what graphmode is, I see it's a legacy option maybe from TF1? If you can put a small snippet so I can try it on my side that'll be helpful!
Yes definitely. As I said, if you like kedro continue using it. Both tools live happily side by side.
Hi GentleSwallow91 let me try and answer your questions 😄
The serving service controller is basically, the main Task that controls the serving functionality itself. AFAIK: clearml-serving-alertmanager - a container that runs the alertmanager by prometheus ( https://prometheus.io/docs/alerting/latest/alertmanager/ ) clearml-serving-inference - the container that runs inference code clearml-serving-statistics - I believe that it runs software that reports to the prometheus reporting ...
Hi SubstantialElk6 For monitoring and production labelling, what we found is that there's no "one size fits all" so while we tried designing ClearML to be easily integrate-able. In the enterprise solution we do have a labeling solution but it's not meant to do production labeling and more to do R&D label fixes. We have customers that integrated 3rd party annotation services with Clearml.
As for DAG workflow, I saw someone who integrated Clearml with Luigi but couldn't find the post anywhere! 😄
You can use:task = Task.get_task(task_id='ID') task.artifacts['name'].get_local_copy()
ImmensePenguin78 we also have a new example for this!
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
If these indices tend to grow large, I think it would be cool if there was a flag that would periodically remove them. probably a lot of users aren't aware that these take up so much space
Hi IcyJellyfish61 , while spinning up and down EKS is not supported (albeit very cool 😄 ) we have an autoscaler in the applications section that does exactly what you need, spin up and down EC2 instances according to demand 🙂
If you're using http://app.clear.ml as you server, you can find it at https://app.clear.ml/applications .
Unfortunately, it is unavailable for the opensource server and only to paid tiers.
Hey There Jamie! I'm Erez from the ClearML team and I'd be happy to touch on some points that you mentioned.
First and foremost, I agree with the first answer that was given to you on reddit. There's no "right" tool. most tools are right for the right people and if a tool is too much of a burden, then maybe it isn't right!
Second, I have to say the use of SVN is a "bit" of a hassle. the MLOps space HEAVILY leans towards git. We interface with git and so does every other tool I know of. That ...
I think you should call dataset.finalize()
FiercePenguin76 Thanks! That's great input! If you're around tomorrow, feel free to ask us questions in our community talk! We'd be happy to discuss 😄
the upload method (which has an SDK counterpart) allows you to specify where to upload the dataset to
We can't officially confirm nor deny this but yes :sleuth_or_spy:
To add to Natan's answer, you can run on the services docker anything depending on the HW. We don't recommend training with it as the server's machine might get overloaded. What you can do is simple stuff like cleanup or any other routines 🙂
once integrating clearml it'll automatically report resource utilization (GPU \ CPU \ Memory \ Network \ Disk IO)
Anything else is missing?
Thanks! 😄 As i've mentioned above, these features were chosen because of users feedback so keep it up and Thanks again!