
Reputation
Badges 1
8 × Eureka!SubstantialElk6 this is a three parter -
getting workers on your cluster, again because of the rebrand I would go to the repo itself for the dochttps://github.com/allegroai/clearml-agent#kubernetes-integration-optional
2. integrating any code with clearml (2 lines of code)
3. executing that from the web ui
If you need any help with the three, the community is here for you π
I'm all for more technical tutorials for doing that... all of this fits the clearml methodology
Hi, this really depends on what your organisation agrees is within MLOps control and what isn't. I think this blogpost is a must read:
https://laszlo.substack.com/p/mlops-vs-devops
and here is a list of infinite amount of MLOps content:
https://github.com/visenger/awesome-mlops
Also, are you familiar with the wonderful MLOPS.community? The meetup and podcasts are magnificent (also look for me in their slack)
https://mlops.community/
Well in general there is no one answer. I can talk about it for days. In ClearML the question is really a non issue since of you build a pipeline from notebooks on your dev in r&d it is automatically converted to python scripts inside containers. Where shall we begin? Maybe you describe your typical workload and intended deployment with latency constraints?
Hi SoggyFrog26 welcome to ClearML!
The way you found definitely works very well, especially since you can use it to change the input model from the UI in case you use the task as a template for orchestrated inference.
Note that you can wrap metadata around the model as well such as labels trained on and network structure, you can also use a model package to ... well package whatever you need with the model. If you want a concrete example I think we need a little more detail here on the fr...
The log storage can be configured if you spin your own clearml-server, but it won't have repository structure. And it shouldn't have btw. If you need secondary backup of everything, it is possible to set something up as well.
Hi Torben, thats great to hear! Which of the new features seems the most helpful to your own use case? Btw to your question the answer is yes. But I'm not exactly sure whats the pythonic way of doing that. AgitatedDove14 ?
Sorry for being late to the party WearyLeopard29 , if you want to see get_mutable_copy() in the wild you can check the last cell of this notebook:
https://github.com/abiller/events/blob/webinars/videos/the_clear_show/S02/E05/dataset_edit_00.ipynb
Or skip to 3:30 in this video:
Which parser are you using? argparse should be logged automatically.
As I wrote before these are more geared towards unstructured data and I will feel more comfortable, as this is a community channel, if you continue your conversation with the enterprise rep. if you wish to take this thread to a more private channel I'm more than willing.
EnviousStarfish54 that is the intention, it is cached. But you might need to manage your cache settings if you have many of those, since there is an initial sane setting for the cache size. Hope this helps.
EnviousStarfish54 I recognize this table π i'm glad you are already talking with the right person. I hope you will get all your questions answered.
SubstantialBaldeagle49
hopefully you can reuse the same code you used to render the images until now, just not inside a training loop. I would recommend against integrating with trains, but you can query the trains-server from any app, just make sure you serve it with the appropriate trains.conf and manage the security π you can even manage the visualization server from within trains using trains-agent. Open source is so much fun!
All should work, again - is it much slower than without trains?
Hi TrickySheep9 , ClearML Evangelist here, this question is the one I live for π are you specifically asking "how do people usually so it with ClearML" or really the "general" answer?
horray for the new channel ! you are all invited
This looks like a genuine git fetch issue. Trains would have problems figuring the diff if git cannot find the base commit...
Do you have submodules on the repo? did the DS push his/her commits?
Hi Elron, I think the easiest way is to print the results of !nvidia-smi or use the framework interface to get these and log them as a clearml artifact. for example -
https://pytorch.org/docs/stable/cuda.html
Hi SubstantialBaldeagle49 ,
certainly if you upload all the training images or even all the test images it will have a huge bandwidth/storage cost (I believe bandwidth does not matter e.g. if you are using s3 from ec2) If you need to store all the detection results (for example, QA, regression testing), you can always save the detections json as artifact and view them later in your dev environment when you need. The best option would be to only upload "control" images and "interesting" im...
EnviousStarfish54 lets refine the discussion - are you looking at structured data (tables etc.) or unstructured (audio, images etc)
There are an astounding number of such channels actually. It probably depends on your style. Would you like me to recommend some?
Of course we can always create a channel here as well... One more can't hurt π
I would say that this is opposite of the ClearML vision... Repos are for codes, ClearML server is for logs, stats and metadata. It can also be used for artifacts if you dont have dedicated artifact storage (depending on deployment etc)
Do you mind explaining your viewpoint?
That's interesting, how would you select experiments to be viewed by the dashboard?
wait, I thought this is without upload
that's how it is supposed to work π let us know if it does not.
MiniatureCrocodile39 lets get that fixed πͺ could you post the link here?
Hi, I think this came up when we discussed the joblib integration right? We have a model registry, ranging from auto spec to manual reporting. E.g. https://allegro.ai/clearml/docs/docs/examples/frameworks/pytorch/manual_model_upload.html
but hey, UnevenDolphin73 nice idea, maybe we should have clearml-around that can report who is using which GPU π
what a turn of events π so lets summarize again:
upkeep script - for each task, find out if there are several models created by it with the same name if so, make some log so that devops can erase files DESTRUCTIVELY delete all the models from the trains-server that are in DRAFT mode, except the last one
if you want something that could work in either case, then maybe the second option is better