Yes EnviousStarfish54 the comparison is line by line and compared only to the left experiment (like any multi comparison, you have to set the baseline, which is always the left column here, do notice you can reorder the columns and the comparison will be updated)
BTW: if you make the right column the base line (i.e. move it to the left, you will get what you probably expected)
I use Yaml config for data and model. each of them would be a nested yaml (could be more than 2 layers), so it won't be a flexible solution and I need to manually flatten the dictionary
Yes, you are correct, the recommended option would be to store it with task.connect_configuration
it's goal is to store these types of configuration files/objects.
You can also store the yaml file itself directly just pass Path object instead of dict/string
I use Yaml config for data and model. each of them would be a nested yaml (could be more than 2 layers), so it won't be a flexible solution and I need to manually flatten the dictionary
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
diff line by line is probably not useful for my data config
You could request a better configuration diff feature 🙂 Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes 😞
EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when running with the trains-agent, the code will have the values from the trains-server (instead of the values set in code), this is the "connect" idea.
Make sense ?
I am not sure what's the difference of logging with "configuration" and "hyperparameters", for now , I am only using it as logging, I guess hyperparmeters has special meaning if I want to use "trains" for some other features.
Thanks for your help. I will stick with task.connect() first. I have submit a Github Issue, thanks again AgitatedDove14
using configuration directly it actually worse than using a dictionary for hyperparmaeters. It would do the diff line by line (notice the right experiment)
https://github.com/quantumblacklabs/kedro-examples/blob/master/kedro-tutorial/conf/base/catalog.yml
I am actually using Kedro (a pipeline library), you can check out the yaml config here. There will be a lot of cases that I need to insert a new argument or dataset in between
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')
Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
In this case, I would rather use task.connect(), diff line by line is probably not useful for my data config. As shown in the example, shifting 1 line would result all remaining line different.
But this also mean I have to first load all the configuration to a dictionary first.
I tried pass the dictionary but the output is not ideal. I would want to have some nested dict like the "execution" > "Source" layout.
As number of parameters can be large, having some hierarchy in the UI will be much easier for comparison