Hi, I Have A Few Questions Regards To

Unanswered

Great discussion, I agree with you both. For me, we are not using clearml-data, so I am a bit curious how does a "published experiment" locked everything (including input? I assume someone can still just go inside the S3 bucket and delete the file without Clearml noticing).

From my experience, absolute reproducibility is code + data + parameter + execution sequence. For example, random seed or some parallelism can cause different result and could be tricky to deal with sometimes. We did build an internal system to ensure reproducibility. ClearML is experiment tracking component, then we integrate with Kedro for pipeline + parameters + data, so everything is tracked automatically.

I have been thinking to replace the data tracking component, our solution works fine but it is not the most efficient one. With GBs size of artifacts generated in every experiment, we have increasing need to do housekeeping regularly. Thus I am studying what's the best way to do so. "Tag" and "publish experiment" is what we are considering.

  				
Posted 
	3 years ago

					More  		
  Report
		
					EnviousStarfish54
				
					0
					 × 1

206 Views

0 Answers

3 years ago

2 years ago