Hi ExasperatedCrocodile76
In terms of the Alias, yes, you don't need to specify one. Recently we changed things that if you do add an alias, the dataset alias and ID will automatically be added in the experiment manager. So it's more a print that says: "hey, you might want to try this!"
Second, this is my own version of the example that's no longer maintained. The official version is here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k
We changed the link in that video description as soon as it was moved, but it seems you were one of the early viewers 😄
Third, could you try to run the example from this new url/repository? There were some fixes in there since launch. If it still doesn't work, let me know and I'll run the example myself to try and reproduce 🙂
And it seems you are fully correct, log_dataset_statistics
is not even called in preprocessing.py
🤔 That seems like a bad oversight from our end. If you add it yourself, does it work then? Add it on line 140, just after dataset_task.flush
the line itself is: self.log_dataset_statistics(dataset_task)
also change line 84 to values=self.metadata['label'],
You found it, so if you want you can open a PR and fix it so you get credit for it 🙂 If not, I can do it if you want!
This ^
If you're not getting any errors, it should work just fine 🙂
In https://github.com/thepycoder/urbansounds8k/blob/main/preprocessing.py i'm seeing dataset_task.get_logger().report_image
, dataset_task.get_logger().report_table
, dataset_task.get_logger().report_histogram
and dataset_task.get_logger().report_media
which are all manual loggings. Hence, why the author probably didn't use any automatic logging.
ExasperatedCrocodile76 I have ran the example and even with my fix it is not ideal. The sample is relatively old now, so I'll revise it asap. Thank you for noticing it!
Hello ExasperatedCrab78 ! I tried to add the PR. I hope it will be sufficient and clear. Have a nice day and thanks for help ! :)
Thanks for the reply. I was trying out this feature on a dummy example. I used the following commanddataset = Dataset.get( dataset_project="palmer penguins", dataset_name="raw palmer penguins", alias="my_test_alias_name", overridable=True)
That was the only time I called the get()
command. I still got the message that I should specify the alias. I can try and do a bit of debugging to see why it gets called multiple times.
am I getting it right that alias = dataset id which can be found in the clearml dashboard?
Not really. It's Alias + dataset_id that will be found in the clearml dashboard 🙂 I'm attaching a screenshot of what that looks like in both the code and the dashboard
I would like to see it used in a clear example as it was intended to be used before giving my opinion on it, if that makes sense
Wow, that was fast. Thanks a lot for your prompt response! Will check it out now :D
Oh thanks ! 🙂 I understand now. Please let me know about the other problem.
The point of the alias is for better visibility in the Experiment Manager. Check the screenshots above for what it looks like in the UI. Essentially, setting an Alias makes sure the task that is getting the dataset automatically logs the ID that it gets using Dataset.get()
. The reason being that if you later on look back to your experiment, you can also see what dataset was .get()
't back then.
ExuberantBat52 When you still get the log messages, where did you specify the alias? Because if that's true, we might have a bug on our hands 🧐
So what's the point of the alias? It's not very clear.. Even after specifying an alias I am still getting the following message: Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server
Also, please note that since the video has been uploaded, the dataset UI has changed. So now a dataset will be found under the dataset tab on the left instead of in the experiment manager 🙂
Hey ExasperatedCrocodile76 and ExuberantBat52
Thanks for your contributions, I've updated the example here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k
Thank you so much ExasperatedCrocodile76 , I'll check it tomorrow 🙂
Hi ExasperatedCrab78 ,
am I getting it right that alias = dataset id which can be found in the clearml dashboard?
I also tried the changed url (github) you sent. I can successfully run all the scripts, but I do not see any results from log_dataset_statistics
. Maybe I am wrong but in the https://github.com/allegroai/clearml-blogs/blob/master/urbansounds8k/preprocessing.py this function is not even called. Could you help me with that please? I just want to replicate -> clearml dashboard -> preprocessing -> results -> plots. How to correctly call it, Thanks.
ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs 😄
Anyways no histogram / table is reported in dashboard however in tutorial video @ https://www.youtube.com/watch?v=quSGXvuK1IM it is.
I knew that, I was just happy that we have an updated example 😁
This update was just to modernize the example itself 🙂
Nice find! I'll pass it through to the relevant devs, we'll fix that right up 🙂 Is there any feedback you have on the functionality specifically? aka, would you use alias give what you know now or would you e.g. name it differently?
Please do, if you find any more issues (due to my shitty code or otherwise 😄 ) let me know and I'll fix 'em!
Hi ExasperatedCrocodile76 , I think you are correct. This is simply an info print out.
ExasperatedCrab78 do you know how this could be?
Right so I figured out why it was calling it multiple times. Everytime a dataset is serialiazed, it calls the _serialize()
function inside of clearml/datasets/dataset.py
file, the _serialize()
method calls self.get(parent_dataset_id)
which is the same get()
method. This means that the user will always be prompted with the log, even if they are not "getting" a dataset. So anytime a user creates, uploads, finalizes a dataset, they will be prompted with the message above.
Hope that helped ExasperatedCrab78