Reputation
Badges 1
8 × Eureka!Nbdev ia "neat" but it's ultimately another framework that you have to enforce.
Re: maturity models - you will find no love for then here π mainly because they don't drive research to production
Your described setup can easily be outshined by a ClearML deployment, but sagemaker instances are cheaper. If you have a limited number of model architectures you can get tge added benefit of tracking your s3 models with ClearML with very little code changes. As for deployment - that's anoth...
Hi! Looks like all the processes are calling torch.save so it's probably reflecting what Lightning did behind the curtain. Definitely not a feature though. Do you mind reporting this to our github repo? Also, are you also getting duplicate experiments?
I would say that this is opposite of the ClearML vision... Repos are for codes, ClearML server is for logs, stats and metadata. It can also be used for artifacts if you dont have dedicated artifact storage (depending on deployment etc)
Do you mind explaining your viewpoint?
That's interesting, how would you select experiments to be viewed by the dashboard?
The colors are controlled by the front-end so not programmatically, but it is possible and (in my opinion) convenient to do so manually via clicking the appropriate position on the legend. Does this meet your expectations?
Totally within ClearML :the_horns: :the_horns:
Btw the reason that they initialize with barely discernible color lies within the hash function that encodes the str content to a color. I.e., this is actually a feature
then your devops can delete the data and then delete the models pointing to that data
what a turn of events π so lets summarize again:
upkeep script - for each task, find out if there are several models created by it with the same name if so, make some log so that devops can erase files DESTRUCTIVELY delete all the models from the trains-server that are in DRAFT mode, except the last one
I'm specifically interested in the model-first queries you would like to do (as experiment-first queries are fully featured, we want to understand whats the best way to take that into models)
wait, I thought this is without upload
with upload I would strongly recommend against doing this
Hi BattyLion34 , could you clarify a little? If I understand correctly, you wish to use a code repository to store artifacts and ClearML logs?
There short answer is "definitely yes" but to get maximum usage you will probably want to setup priority queues
Welcome! The machines are the ones you install and run the trains-agent daemon on, and creating the queues can be done via the trains-agent cli or the webapp UI
OddAlligator72 can you link to the wandb docs? Looks like you want a custom entry point, I'm thinking "maybe" but probably the answer is that we do it a little differently here.
Hi, this really depends on what your organisation agrees is within MLOps control and what isn't. I think this blogpost is a must read:
https://laszlo.substack.com/p/mlops-vs-devops
and here is a list of infinite amount of MLOps content:
https://github.com/visenger/awesome-mlops
Also, are you familiar with the wonderful MLOPS.community? The meetup and podcasts are magnificent (also look for me in their slack)
https://mlops.community/
Sorry for being late to the party WearyLeopard29 , if you want to see get_mutable_copy() in the wild you can check the last cell of this notebook:
https://github.com/abiller/events/blob/webinars/videos/the_clear_show/S02/E05/dataset_edit_00.ipynb
Or skip to 3:30 in this video:
WackyRabbit7 It is conceptually different than actually training, etc.
The service agent is mostly one without a gpu, runs several tasks each on their own container, for example: autoscaler, the orchestrators for our hyperparameter opt and/or pipelines. I think it even uses the same hardware (by default?) of the trains-server.
Also, if I'm not mistaken some people are using it (planning to?) to push models to production.
I wonder if anyone else can share their view since this is a relati...
Hi SubstantialElk6 , have a look at Task.execute_remotely, and it's especially for that. For instance in the recent webinar, I used pytorch-cpu on my laptop and task.execute_remotely. the agent automatically installs the GPU version. Example https://github.com/abiller/events/blob/webinars/webinars/flower_detection_rnd/A1_dataset_input.py
This looks like a genuine git fetch issue. Trains would have problems figuring the diff if git cannot find the base commit...
Do you have submodules on the repo? did the DS push his/her commits?
EnviousStarfish54 first of all, thanks for taking the time to explore our enterprise offering.
- Indeed Trains is completely standalone. The enterprise offering adds the necessary infrastructure for end-to-end integration etc. with a huge emphasis on computer vision related R&D.
- The data versioning is actually more than just data versioning because it adds an additional abstraction over the "dataset" concept, well this is something that the marketing guys should talk about... unless you ...
same name == same path, assuming no upload is taking place? *just making sure
Hmm, anything -m will solve? https://docs.docker.com/config/containers/resource_constraints/
or is it a segfault inside the container becuase ulimit
isn't set to -s unlimited
?
So basically export a webapp view as csv?
script runs, tries to register 4 models, each one of them is exactly found in the path, size/timestamp is different. then it will update the old 4 models with the new details and erase all the other fields
These are excellent questions. While we are working towards including more of our users stack within the ClearML solution, there is still time until we unveil "the clearml approach" to these. From what I've seen within our community, deployment can anything from a simple launch of a docker built with 'clearml-agent build' to auto training pipelines.
Re triggering - this is why we have clearml-task π
Difficult without a reproducer, but I'll try: How did you get the logger? Maybe you forgot parentheses at task.get_logger() ?
Aha so my flower detector example is not the best one to start with... My suggested route would be framework then experiment tracki g and then ochestration. If you wish to "cut corners" you could try our hyperparam blogpost https://link.medium.com/uGA6DePqmeb