Reputation
Badges 1
8 × Eureka!wait, I thought this is without upload
isn't it better if it just updates the timestamp on the old models?
with upload I would strongly recommend against doing this
I will dig around to see how all of this could be accomplished.
Right now I see it done in two ways:
a function that you must remember to call each time that would do the upkeep a script that you can run once in a while to do cleanupwhich one would you prefer that I will pursue?
Shh AgitatedDove14 you're dating yourself π
so what I am describing is exactly this - once you try to create an output model from the same task, if the name already exists - do not create a new model, just update the timestamp on the old one
Cool. I found Logger.tensorboard_single_series_per_graph()
EnviousStarfish54 I recognize this table π i'm glad you are already talking with the right person. I hope you will get all your questions answered.
Thanks DeliciousBluewhale87 , setting up with k8s is definitely a good suggestion!
That's always nice to hear. Remember that many of these improvements came from the community and you can always submit a feature request on our github repo https://github.com/allegroai/clearml/issues
Aren't the two lines enough for you? BTW why lightning and not ignite?
Okay so sounds like two bugs stacked together? I wonder if this is gitlab specific. Could you provide a list of a steps to reproduce? π
Hi, this really depends on what your organisation agrees is within MLOps control and what isn't. I think this blogpost is a must read:
https://laszlo.substack.com/p/mlops-vs-devops
and here is a list of infinite amount of MLOps content:
https://github.com/visenger/awesome-mlops
Also, are you familiar with the wonderful MLOPS.community? The meetup and podcasts are magnificent (also look for me in their slack)
https://mlops.community/
Looks like it is still running DeliciousSeaanemone40 , you're suggesting it is slower than usual? There are some messages there that I've never seen before
Not so much relevant, since it can be seen from your task π but it would be interesting to find out if trains made something be much slower, and if so - how
Hi, I think this came up when we discussed the joblib integration right? We have a model registry, ranging from auto spec to manual reporting. E.g. https://allegro.ai/clearml/docs/docs/examples/frameworks/pytorch/manual_model_upload.html
Difficult without a reproducer, but I'll try: How did you get the logger? Maybe you forgot parentheses at task.get_logger() ?
BattyLion34 this is up to the discretion of the meetup organizers. At any case, I am going to use the same demos to create several of my stuffed animal videos (we can also upload the same videos without the stuffed animals if there is demand for that)
Hi RobustHippopotamus53 , I think this it just the place to ask this, we are all ClearML users here π Let me ask you this - did you merge and also push? When I forget to push after merging a PR I think this is the same error message I get.
and should it work across your workspace, i.e. does not matter if task id changed? just always keep a single model with the same filename? i'm really worried this could break a lot of the reproducible/repeateable flow for your process.
Well in general there is no one answer. I can talk about it for days. In ClearML the question is really a non issue since of you build a pipeline from notebooks on your dev in r&d it is automatically converted to python scripts inside containers. Where shall we begin? Maybe you describe your typical workload and intended deployment with latency constraints?
Hi! Looks like all the processes are calling torch.save so it's probably reflecting what Lightning did behind the curtain. Definitely not a feature though. Do you mind reporting this to our github repo? Also, are you also getting duplicate experiments?
What's your code situation? Is it open enough to allow you to create an issue for this on our GitHub?
From what i remember the bins in tb are wider. And the tapering off around zero cannot be real since this happens in super sparse modela. Overall if you are sure, than this is a nice issue to open on GitHub.
Honestly, it looks like the tensorboard representation is the wrong one. Only one way to find out - you need to plot the histogram on your own π
Nbdev ia "neat" but it's ultimately another framework that you have to enforce.
Re: maturity models - you will find no love for then here π mainly because they don't drive research to production
Your described setup can easily be outshined by a ClearML deployment, but sagemaker instances are cheaper. If you have a limited number of model architectures you can get tge added benefit of tracking your s3 models with ClearML with very little code changes. As for deployment - that's anoth...