Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is The App/Ui/Backend Customizable? Any Tutorials For That?

Is the app/ui/backend customizable? Any tutorials for that?

  
  
Posted 4 years ago
Votes Newest

Answers 13


Hi CleanWhale17 , at least for the moment, the code although open ( https://github.com/allegroai/trains-web ) has no external theme/customization interface.
That said we do have some thoughts on it.., What did you have in mind ?

  
  
Posted 4 years ago

I work on VisionAI so would need integration to my existing data pipeline (including the annotation tools - LabelMe, VGG etc) and also add features like Email Alert for finished Job(I'm not sure if it's already there).

Others doubts that I have is:
How does it compare to Apach AirFlow or DVC for Data Management(if I'm not going for the Paid version)?

  
  
Posted 4 years ago

Automated Data Source Integration Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) Storage of Annotation output files(versioned JSON) Online-Training Support(for Dataset Shifts) Data Pre-processessing (filter/augment) Data-set visualization(stats of Dataset) Experiment Management(which is why I liked TRAINS), Jupyter Integration(for Test Management) Training Progress Visualization(TensorBoard like) Inferencing and Visualization of Results Reproducibility of Training Results
Thanks for sharing the Case Study link. Please let me know if all of the above requirements are already there in TRAINS or planned or can be integrated with external tools doing them.

  
  
Posted 4 years ago

online-training:
Re-training the models to update it's weights for any new dataset introduced after the previous deployment. Based on certain threshold, we can decide when to re-train the model.

It's mainly application for scenarios that involve streaming/sequential data sets that are made available over time. E.g. Facial Recognition or Retails usecases for a new Fashion segments.

  
  
Posted 4 years ago

Glad to know it.
As I'm a Full-stack developer at Core. I'd be looking to extend the TRAINS Frontend and Backend APIs to suit my need of On-Prem data storage integration and lots of other customization for Job Scheduler(CRON)/Dataset Augmentation/Custom Annot. tool etc.

Can you guide me to one such tutorial that's teaching how to customize the backend/front end with an example?

  
  
Posted 4 years ago

I would recommend reading this blog post, it should give you a glimpse of what can be built πŸ™‚
https://medium.com/pytorch/how-trigo-built-a-scalable-ai-development-deployment-pipeline-for-frictionless-retail-b583d25d0dd

  
  
Posted 4 years ago

Hi CleanWhale17 let me see if I can address them all

Email Alert for finished Job(I'm not sure if it's already there).

Slack integration will be public by the end of the weekend πŸ™‚
It is fully customization / extendable, I'll be happy to help .

DVC

Full dataset tracking is supported using the artifacts and the ability to integrate to any central storage (shared folders/ S3 / GS / Azure etc.)
From my experience, it is easier to work with artifacts from Data-Processing Tasks, as Trains offers full caching and flexible Storage options, I always have the feeling "git-alike" commit/pull for dataset is the wrong approach, that said there is nothing that will limit you in integrating DVC into your pipeline.
If you are doing Computer-Vision based DL, which means annotation on json files, and pointers to actual files. Then it makes a lot of sense to have the annotation in a single json file as a Data-Processing Task (fully versioned of course), then from training Task pull the json (caching is supported), then from the json access the actual image files with direct file sharing or Using the Trains StorageManager, that does all the heavy lifting for you and can pull data from S3/Gs/Azure etc, with caching built in.

Apache AirFlow

If you have a K8s cluster and you want production grade orchestration, by all means consider AirFlow or KubeFlow. That said for R&D and constantly changing repositories/requirements, Trains offer the ability to reuse containers (so that you do not end up with a conainer per experiment, then 1000's of unused containers) and also the ability to build a fully standalone container from any experiment (i.e. package an experiment/Task in a container for later use with any orchestration solution)
Last thing, K8s is great for managing resources, not so much for scheduling.
You can use trains-agent as bare metal agent, to run containers on any machine (setup with pip install, it is that easy). Or you can integrate with K8s, there are a few example and documentation on the Nvidia NGC could (we are the leading supported platform for managing experiments on Nvidia K8s clusters)

  
  
Posted 4 years ago

As I'm a Full-stack developer at Core. I'd be looking to extend the TRAINS Frontend and Backend APIs to suit my need of On-Prem data storage integration and lots of other customization for Job Scheduler(CRON)/Dataset Augmentation/Custom Annot. tool etc.

That is awesome! Feel free to post a specific question here, and I'll try to direct to the right place πŸ™‚

Can you guide me to one such tutorial that's teaching how to customize the backend/front end with an example?

You mean like pipelines / automation etc.?
I f that is the case take a look at the examples folder :
https://github.com/allegroai/trains/tree/master/examples
Mostly the automation and services subfolders.
And also on the trains-agent examples (with the AWS autoscaler example, that soon will be rewritten so it is easier to extend )
https://github.com/allegroai/trains-agent/tree/master/examples

  
  
Posted 4 years ago

CleanWhale17 nice ... πŸ™‚
So the answer is Trains supports the Pipeline / Automation of it, but lacks that dataset integration (that is basically up to you to manage, with either artifacts or any other method)
The Allegro Enterprise allows you to rerun the code, on a new version of the dataset from the UI (or automation) without changing a single line of code πŸ™‚

  
  
Posted 4 years ago

Thanks for the details comparison.. i'll have to look more into these tools to come to any conclusion based on my needs.

Here's what I'm looking at:
An automated ML Pipeline

  
  
Posted 4 years ago

CleanWhale17 per your request :)

An automated ML Pipeline πŸ‘ Automated Data Source Integration πŸ‘ Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) [Allegro Enterprise] or users integrate with open-source Storage of Annotation output files(versioned JSON) πŸ‘ Online-Training Β Support(for Dataset Shifts) [Not Sure what you mean] Data Pre-processessing (filter/augment) [Allegro Enterprise] or users integrate with open-source Data-set visualization(stats of Dataset) [Allegro Enterprise] or users integrate with open-source Experiment Management(which is why I liked TRAINS), πŸ‘ Jupyter Integration(for Test Management) πŸ‘ Training Progress Visualization(TensorBoard like) πŸ‘ Inferencing and Visualization of Results πŸ‘ Reproducibility of Training Results πŸ‘

  
  
Posted 4 years ago

thanks for the reference Martin.. I'd soon by starting with the TRAINS.. and would be in touch on the progress.

  
  
Posted 4 years ago

CleanWhale17 what is " Online-Training Β Support(for Dataset Shifts" ?

  
  
Posted 4 years ago
802 Views
13 Answers
4 years ago
one year ago
Tags
Similar posts