Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is There An Enterprise Version Of Trains? If Yes, What Are The Addition Features,

is there an enterprise version of trains? if yes, what are the addition features,
the website shows allegros enterprise version that seems not related to trains in particular.

  
  
Posted 3 years ago
Votes Newest

Answers 18


EnviousStarfish54 first of all, thanks for taking the time to explore our enterprise offering.

  1. Indeed Trains is completely standalone. The enterprise offering adds the necessary infrastructure for end-to-end integration etc. with a huge emphasis on computer vision related R&D.
  2. The data versioning is actually more than just data versioning because it adds an additional abstraction over the "dataset" concept, well this is something that the marketing guys should talk about... unless you want to hear more about how I view it - and just DM me here or on twitter https://twitter.com/LSTMeow
  
  
Posted 3 years ago

for the most common workflow, I may have some csv, which may be updated from time to time

  
  
Posted 3 years ago

Also, while we are at it, EnviousStarfish54 ,can I just make sure - you meant this page, right?
https://allegro.ai/enterprise/

  
  
Posted 3 years ago

AgitatedDove14
are the data versioning completely different from the Trains Artifact/storage solution? or it's some enhanced feature.

  
  
Posted 3 years ago

GrumpyPenguin23 yes, those features seems to related to other infrastructure, not Trains (ML experiment management)

  
  
Posted 3 years ago

I am interested in machine learning experiment mangament tools.

I understand Trains already handle a lot of things on the model side, i.e. hyperparameters, logging, metrics, compare two experiments.

I also want it to help reproducible. To achieve that, I need code/data/configuration all tracked.

For code and configuration I am happy with current Trains solution, but I am not sure about the data versioning.

So if you have more details about the dataset versioning with the enterprise offer, I am interested to know more.

  
  
Posted 3 years ago

EnviousStarfish54 lets refine the discussion - are you looking at structured data (tables etc.) or unstructured (audio, images etc)

  
  
Posted 3 years ago

I need to check something for you EnviousStarfish54 , I think one of our upcoming versions should have something to "write home about" in that regard

  
  
Posted 3 years ago

Do you know what is the "dataset management" for the open-source version?

  
  
Posted 3 years ago

EnviousStarfish54 that is the intention, it is cached. But you might need to manage your cache settings if you have many of those, since there is an initial sane setting for the cache size. Hope this helps.

  
  
Posted 3 years ago

Ok, then maybe it can be still used as a data versioning solution. Except that I have to manually track the task id (those generate artifact) for versioning myself.

  
  
Posted 3 years ago

I wonder what's the extra features is offered in the enterprise solution tho

  
  
Posted 3 years ago

As I wrote before these are more geared towards unstructured data and I will feel more comfortable, as this is a community channel, if you continue your conversation with the enterprise rep. if you wish to take this thread to a more private channel I'm more than willing.

  
  
Posted 3 years ago

EnviousStarfish54 data versioning on the open source leverages the artifacts and storage and caching capabilities of Trains.
A simple workflow

  1. Upload data
    https://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py
  2. Preprocessing data
    https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py
  3. Using data
    https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
  
  
Posted 3 years ago

EnviousStarfish54 I recognize this table 😉 i'm glad you are already talking with the right person. I hope you will get all your questions answered.

  
  
Posted 3 years ago

Hi EnviousStarfish54
The Enterprise edition extends Trains functionality.
It adds security, scale and full data management (data management and versioning being the key difference)
You can get it as a saas solution or on prem.
If you need more information, you can leave contact details on the website, I'm sure sales will be happy to help :)

  
  
Posted 3 years ago

potentially both, but let just say structure data first, like CSV, pickle (may not be a table, could be any python object), feather, parquet, some common data format

  
  
Posted 3 years ago

for the open source version, if I use artifact, if I already have a local file, does it knows to skip downloading it or it will always replace the file? As my dataset is large (~100GBs), I cannot afford it to be re-downloaded everytime

  
  
Posted 3 years ago
514 Views
18 Answers
3 years ago
one year ago
Tags