Hi @<1523701087100473344:profile|SuccessfulKoala55> ,
thanks for the pointers.
I didn't know that the plot data is stored in elasticsearch. Good to know. It relates to the rest of my questions in that I want to understand where everything is saved, all the parts of my experiments. The plots are actually the most important part, since I have direct access to the artifacts I save (like, say, models) but not to the plot data which helps me compare and rank experiments. I mention tensorboard because that's what's producing the traces. I'm still not sure if clearml is actually storing plot data inside elasticsearch or simply linking to the tensorboard's tfevent files.
I still have no idea what the correct way to set up the access to the blob storage. Again, writing from the sdk is fine, retrieving from the webui is not. As described in the first screenshot, the first two fields for web app cloud access , "bucket" and "key", the values written by clearml are "azure", which can't be right. My question is - what real world azure concepts are these two names related to, so I can have a better guess at what the correct values for them might be (or simply a working example would be amazing).
Regarding the file_server
configuration, would you be so kind to give an example. I couldn't find anything. The way i managed to get artifact upload working was by using the output_uri
in Task.init
.
What I have in my config right now for api.files_server
is:
api {
files_server:
}
Imagine my confusion 😭
Thanks again for your advice.
Sorry to ping you @<1523701087100473344:profile|SuccessfulKoala55> , can you offer any ideas to the two questions from my reply (about the correct web app cloud access and the correct way to specify a blob storage in the clearml.conf
file? Thanks 🙏
This is how the links to the artifacts looks like (the part I blurred out is is the last part of the secret, which is working fine since the task was able to upload those correctly to storage, I can check that):
Hi @<1523701987055505408:profile|WittyOwl57> ,
For the files_server
, this controls the upload of debug_images, and would be either the fileserver address (like you have now, I assume), or some object storage like None for example.
For setting an object storage for models and artifacts, you would need to set up the default_output_uri
field in the clearml.conf file.
Regarding azure setup in the WebApp, @<1523701070390366208:profile|CostlyOstrich36> do you have some real-world examples?
Hi @<1523701987055505408:profile|WittyOwl57> ,
is the plot data stored in mongo, or does mongo just store some links?
Plot data is stored in the ElasticSearch database. I am not sure how is this related to the rest of your question as they pertain to images 🙂
If I, say, copy the clearml data directory from the existing machine to a different location, and copy the tensorboard data to the same absolute path, will that work?
That should work. I'm not sure why tansorboard is mentioned here, but if you're talking about the fileserver storage folder, than it would work.
In the future I want to avoid this problem (of having the move experiment tracking, merge experiments from different machines, etc). What is the best practice for that? I would try to store everything (including actual experiment data, e.g. tensorboard, logs, etc) to blob storage, but I don't think that is possible.
Scaling should not be an issue. Using blob storage for uploaded artifacts and images is part of the system's design (by using the files_server
and output_uri
configuration options on the client side). Anything else can be handles by scaling up the databases, which as you mentioned should not be an issue.