AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
Yes, but the main issue is the parsing, it needs to have a specific standard. We use HOCON because it is great to read and edit (basically JSON would be a subset of HOCON)
the original pyhocon does support include statements as you mentioned -
Correct, my thinking was to expand them into "@configuration_section.key" or something of that nature
I don't think there's a PR issue for that yet, at least I haven't created one.
I could have a look at this and maybe make a PR.
Not sure what would the recommended flow be like though 🤔
Right and then for text (file path) use some regex or similar for extraction, and for dictionary simply parse the values?
BTW AgitatedDove14 following this discussion I ended up doing the regex way myself to sync these, so our code has something like the following. We abuse the object description here to store the desired file path.
config_path = task.connect_configuration(configuration=config_path, name=config_fname) included_files = find_included_files_in_source(config_path) while included_files: file_to_include = included_files.pop() sub_config = task.connect_configuration( configuration=file_to_include, name=file_to_include.name, description=file_to_include.as_posix() ) included_files |= find_included_files_in_source(sub_config) file_to_include.parent.mkdir(parents=True, exist_ok=True) sub_config.rename(file_to_include.as_posix())
And last but not least, for dictionary for example, it would be really cool if one could do:my_config = task.connect_configuration(my_config, name=name) my_other_config = task.connect_configuration(my_other_config, name=other_name) my_other_config['bar'] = my_config # Creates the link automatically between the dictionaries
We abuse the object description here to store the desired file path.
LOL, yep that would work, I'm assuming you have some infrastructure library that does this hack for you, but really cool way around it 🙂
And last but not least, for dictionary for example, it would be really cool if one could do:
Hmm what you will end up now is the following behaviour,my_other_config['bar']
will hold a copy of my_config
, if you clone the Task and change "my_config" it will have no effct because the assignment my_other_config['bar']=my_config
is Ignored when running remotely
But if you want to be able to change my_other_config
you need to do: task.connect_configuration(my_other_config, name=other_name)
, which will put the configuration into my_other_config
dict, but will allow you to change it as you wishmy_other_config['bar'] = my_config # Creates the link automatically between the dictionaries
The difficulty here is to create the "link" between them, but it is possible, and would actually be Very cool, I'm totally with you
Hi UnevenDolphin73
I cannot initialize a task before loading the file, but the docs for
connect_configuration
Yes, that's basically the problem. you have to decide where is the main driver.
If you are executing the code "manually" (i.e. not via the agent) then there is no problem, obviously you have the local file and you can use it to load the "project name" etc, then you just call Task.connect_configuration to log the content.
If you are running the same code via the agent, then by definition you are controlling the project and Task name from ClearML, Not the configuration file (remeber that you are creating the Task before you are running it), in that case you code will gracefully fail to load the conf file before callign Task.init but will find after the connect_configuration call, something like:
` try:
open conf file and read it
conf = read_file("my_local_file.json")
except:
conf = {}
task = Task.init(project_name=conf.get("project_name"), ...)
this will Always work, if running locally it will return the same as the local configuration, and if running remotely it will return a path to a a local file containing the exact content as the original conf file.
my_awlays_valid_conf_file = task.connect_configuration("my_local_file.json")
reload configuration, by now we have everything no matter what
conf = read_file(my_awlays_valid_conf_file) `
AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
https://github.com/allegroai/clearml/blob/18c7dc70cefdd4ad739be3799bb3d284883f28b2/clearml/task.py#L1592
And task = Task.init(project_name=conf.get("project_name"), ...)
is basically a no-op in remote execution so it does not matter if conf
is empty, right?
Now, the original pyhocon does support include statements as you mentioned - https://github.com/chimpler/pyhocon
One must then ask, of course, what to do if e.g. a text refers to a dictionary configuration object? 🤔
Because by definition the Task already exists
We have a more complicated case but I'll work around it 😄
Follow up though - can configuration objects refer to one-another internally in ClearML?
can configuration objects refer to one-another internally in ClearML?
Interesting, please explain?
In our case, we have a custom YAML instruction
!include
, i.e.
Hmm interesting, in theory this might work since configuration encoding (when passing dicts), is handled with HOCON which does support referencing.
That said currently it is not aware of "remote configurations" only ENV variables and local files.
It will be cool to add, do we have a github issue on that? (would you like to see if you can PR such a thing?)
I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).
In our case, we have a custom YAML instruction !include
, i.e.
` # foo.yaml
bar: baz
bar.yaml
obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `
Say I upload each of these yamls as a configuration object (as with the above). Once I try to load bar.yaml remotely it will crash, since foo.yaml is missing (and is instead a clearml configuration object).
Does that make sense?