Answered

What Would Be The Best Way To Approach This Flow?

What would be the best way to approach this flow?
We have a configuration file that defines e.g. the project name to use in ClearML, alongside other experiment-specific stuff We'd like this configuration file to be logged and available as a configuration object From the above two, I cannot initialize a task before loading the file, but the docs for connect_configuration say This method should be called before reading the configuration file.Currently we've used the set_configuration_object and get_configuration_object separately for these reasons, is this the expected approach?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 18

Because by definition the Task already exists

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I don't think there's a PR issue for that yet, at least I haven't created one.

I could have a look at this and maybe make a PR.
Not sure what would the recommended flow be like though 🤔

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
https://github.com/allegroai/clearml/blob/18c7dc70cefdd4ad739be3799bb3d284883f28b2/clearml/task.py#L1592

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Correct indeed 👌

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

In our case, we have a custom YAML instruction

!include

, i.e.

Hmm interesting, in theory this might work since configuration encoding (when passing dicts), is handled with HOCON which does support referencing.
That said currently it is not aware of "remote configurations" only ENV variables and local files.
It will be cool to add, do we have a github issue on that? (would you like to see if you can PR such a thing?)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?

Yes, but the main issue is the parsing, it needs to have a specific standard. We use HOCON because it is great to read and edit (basically JSON would be a subset of HOCON)

the original pyhocon does support include statements as you mentioned -

Correct, my thinking was to expand them into "@configuration_section.key" or something of that nature

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Say I upload each of these yamls as a configuration object (as with the above). Once I try to load bar.yaml remotely it will crash, since foo.yaml is missing (and is instead a clearml configuration object).
Does that make sense?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Right and then for text (file path) use some regex or similar for extraction, and for dictionary simply parse the values?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

And task = Task.init(project_name=conf.get("project_name"), ...) is basically a no-op in remote execution so it does not matter if conf is empty, right?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).

In our case, we have a custom YAML instruction !include , i.e.
` # foo.yaml
bar: baz

bar.yaml

obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We have a more complicated case but I'll work around it 😄

Follow up though - can configuration objects refer to one-another internally in ClearML?

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

can configuration objects refer to one-another internally in ClearML?

Interesting, please explain?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi UnevenDolphin73

I cannot initialize a task before loading the file, but the docs for

connect_configuration

Yes, that's basically the problem. you have to decide where is the main driver.
If you are executing the code "manually" (i.e. not via the agent) then there is no problem, obviously you have the local file and you can use it to load the "project name" etc, then you just call Task.connect_configuration to log the content.
If you are running the same code via the agent, then by definition you are controlling the project and Task name from ClearML, Not the configuration file (remeber that you are creating the Task before you are running it), in that case you code will gracefully fail to load the conf file before callign Task.init but will find after the connect_configuration call, something like:
` try:

open conf file and read it

conf = read_file("my_local_file.json")
except:
conf = {}

task = Task.init(project_name=conf.get("project_name"), ...)

this will Always work, if running locally it will return the same as the local configuration, and if running remotely it will return a path to a a local file containing the exact content as the original conf file.

my_awlays_valid_conf_file = task.connect_configuration("my_local_file.json")

reload configuration, by now we have everything no matter what

conf = read_file(my_awlays_valid_conf_file) `

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

And last but not least, for dictionary for example, it would be really cool if one could do:
my_config = task.connect_configuration(my_config, name=name) my_other_config = task.connect_configuration(my_other_config, name=other_name) my_other_config['bar'] = my_config # Creates the link automatically between the dictionaries

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Now, the original pyhocon does support include statements as you mentioned - https://github.com/chimpler/pyhocon

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

One must then ask, of course, what to do if e.g. a text refers to a dictionary configuration object? 🤔

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

BTW AgitatedDove14 following this discussion I ended up doing the regex way myself to sync these, so our code has something like the following. We abuse the object description here to store the desired file path.

config_path = task.connect_configuration(configuration=config_path, name=config_fname) included_files = find_included_files_in_source(config_path) while included_files: file_to_include = included_files.pop() sub_config = task.connect_configuration( configuration=file_to_include, name=file_to_include.name, description=file_to_include.as_posix() ) included_files |= find_included_files_in_source(sub_config) file_to_include.parent.mkdir(parents=True, exist_ok=True) sub_config.rename(file_to_include.as_posix())

  				
Posted 
	2 years ago

					More  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We abuse the object description here to store the desired file path.

LOL, yep that would work, I'm assuming you have some infrastructure library that does this hack for you, but really cool way around it 🙂

And last but not least, for dictionary for example, it would be really cool if one could do:

Hmm what you will end up now is the following behaviour,
my_other_config['bar'] will hold a copy of my_config , if you clone the Task and change "my_config" it will have no effct because the assignment my_other_config['bar']=my_config is Ignored when running remotely
But if you want to be able to change my_other_config you need to do: task.connect_configuration(my_other_config, name=other_name) , which will put the configuration into my_other_config dict, but will allow you to change it as you wish
my_other_config['bar'] = my_config # Creates the link automatically between the dictionariesThe difficulty here is to create the "link" between them, but it is possible, and would actually be Very cool, I'm totally with you

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

18 Answers

2 years ago