
Reputation
Badges 1
151 × Eureka!SuccessfulKoala55 task.connect()
i.e. some files in a shared drive, then someone silently updated the files and all the experiments become invalid and no one knows when did that happened.
Cool, versioning the difference is useful. It also depends on what kind of data. For example, for tabular data, database might be a natural choice, however, how to integrate it and keep track of the metadata could be tricky. While for images, it probably more suitable for blob storage or per file basis.
It will pop up a window like this, and the program only continues when I close this window.
it seems that if I don't use plt.show() it won't show up in Allegro, is this a must?
Ok, sorry, this is my mistake, it's actually inside a loop, so this make sense.
Hi, just to be clear, self hosted option is still available right? I need to know this as we have spent some effort on integrating Trains internally and expect to continue the development for a while.
GrumpyPenguin23 yes, those features seems to related to other infrastructure, not Trains (ML experiment management)
I wonder what's the extra features is offered in the enterprise solution tho
I only find the task.get_last_metrics() API, but I would need the entire metric array instead
I am interested in machine learning experiment mangament tools.
I understand Trains already handle a lot of things on the model side, i.e. hyperparameters, logging, metrics, compare two experiments.
I also want it to help reproducible. To achieve that, I need code/data/configuration all tracked.
For code and configuration I am happy with current Trains solution, but I am not sure about the data versioning.
So if you have more details about the dataset versioning with the enterprise offer...
potentially both, but let just say structure data first, like CSV, pickle (may not be a table, could be any python object), feather, parquet, some common data format
It would be nice if there is an "export" function to just export all/selected experiment table view
for workaround, I write a function to recursive cast my config dictionary into string if needed.
https://github.com/quantumblacklabs/kedro-examples/blob/master/kedro-tutorial/conf/base/catalog.yml
I am actually using Kedro (a pipeline library), you can check out the yaml config here. There will be a lot of cases that I need to insert a new argument or dataset in between
using configuration directly it actually worse than using a dictionary for hyperparmaeters. It would do the diff line by line (notice the right experiment)
Do you know what is the "dataset management" for the open-source version?
CumbersomeCormorant74 Thanks for the reply. Let me clarify, I mean reordering the columns (dragging the column).
If the drag defaults columns, and hit F5, the order is preserved. However, if you try adding a metric column to be the first column and hit F5, it will not be preserved.
TimelyPenguin76 It works fine. I may need to check on my side, I just notice it was caused by @funcy.log_durations decorator. It may changes the function signature and causing some issue with it. I don't have time to look into it yet, but the example works fine.
AgitatedDove14 I believe you mean plt.savefig? I used this function to save my charts, but it does not show up as well.
Great discussion, I agree with you both. For me, we are not using clearml-data, so I am a bit curious how does a "published experiment" locked everything (including input? I assume someone can still just go inside the S3 bucket and delete the file without Clearml noticing).
From my experience, absolute reproducibility is code + data + parameter + execution sequence. For example, random seed or some parallelism can cause different result and could be tricky to deal with sometimes. We did bu...
And the plotting area is completely empty, only some chart titles show up on the left.
but somewhere along the way, the request actually remove the header
This will make the plotting fail
I couldn't report it to demo server, since this involve internal stuff...