Hello Everyone ! I Tried To Reproduce Your Tutorial :

Answered

Hello everyone ! I tried to reproduce your tutorial : https://github.com/thepycoder/urbansounds8k . Basically I got still the same issue: 2022-11-21 09:57:31,573 - clearml - INFO - Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server. If I understood documentation I do not have to specify alias when I do have dataset_project and dataset_name defined.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrocodile76
				
					0
					 × 1

Votes Newest

Answers 25

ExasperatedCrab78 do you know how this could be?

  				
Posted 
	2 years ago

					More  		
  Report
		
					TimelyMouse69
				
					0
					 × 1

Anyways no histogram / table is reported in dashboard however in tutorial video @ https://www.youtube.com/watch?v=quSGXvuK1IM it is.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrocodile76
				
					0
					 × 1

am I getting it right that alias = dataset id which can be found in the clearml dashboard?

Not really. It's Alias + dataset_id that will be found in the clearml dashboard 🙂 I'm attaching a screenshot of what that looks like in both the code and the dashboard

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

This ^
If you're not getting any errors, it should work just fine 🙂

In https://github.com/thepycoder/urbansounds8k/blob/main/preprocessing.py i'm seeing dataset_task.get_logger().report_image , dataset_task.get_logger().report_table , dataset_task.get_logger().report_histogram and dataset_task.get_logger().report_media which are all manual loggings. Hence, why the author probably didn't use any automatic logging.

  				
Posted 
	2 years ago

					More  		
  Report
		
					TimelyMouse69
				
					0
					 × 1

Hi ExasperatedCrab78 ,
am I getting it right that alias = dataset id which can be found in the clearml dashboard?

I also tried the changed url (github) you sent. I can successfully run all the scripts, but I do not see any results from log_dataset_statistics . Maybe I am wrong but in the https://github.com/allegroai/clearml-blogs/blob/master/urbansounds8k/preprocessing.py this function is not even called. Could you help me with that please? I just want to replicate -> clearml dashboard -> preprocessing -> results -> plots. How to correctly call it, Thanks.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrocodile76
				
					0
					 × 1

Hi ExasperatedCrocodile76 , I think you are correct. This is simply an info print out.

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

I would like to see it used in a clear example as it was intended to be used before giving my opinion on it, if that makes sense

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

And it seems you are fully correct, log_dataset_statistics is not even called in preprocessing.py 🤔 That seems like a bad oversight from our end. If you add it yourself, does it work then? Add it on line 140, just after dataset_task.flush the line itself is: self.log_dataset_statistics(dataset_task) also change line 84 to values=self.metadata['label'],
You found it, so if you want you can open a PR and fix it so you get credit for it 🙂 If not, I can do it if you want!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Please do, if you find any more issues (due to my shitty code or otherwise 😄 ) let me know and I'll fix 'em!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Wow, that was fast. Thanks a lot for your prompt response! Will check it out now :D

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

Hey ExasperatedCrocodile76 and ExuberantBat52
Thanks for your contributions, I've updated the example here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hi ExasperatedCrocodile76

In terms of the Alias, yes, you don't need to specify one. Recently we changed things that if you do add an alias, the dataset alias and ID will automatically be added in the experiment manager. So it's more a print that says: "hey, you might want to try this!"

Second, this is my own version of the example that's no longer maintained. The official version is here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k
We changed the link in that video description as soon as it was moved, but it seems you were one of the early viewers 😄

Third, could you try to run the example from this new url/repository? There were some fixes in there since launch. If it still doesn't work, let me know and I'll run the example myself to try and reproduce 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

So what's the point of the alias? It's not very clear.. Even after specifying an alias I am still getting the following message: Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs 😄

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Thank you so much ExasperatedCrocodile76 , I'll check it tomorrow 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Oh thanks ! 🙂 I understand now. Please let me know about the other problem.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrocodile76
				
					0
					 × 1

This update was just to modernize the example itself 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

ExasperatedCrocodile76 I have ran the example and even with my fix it is not ideal. The sample is relatively old now, so I'll revise it asap. Thank you for noticing it!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

I knew that, I was just happy that we have an updated example 😁

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

Thanks for the reply. I was trying out this feature on a dummy example. I used the following command
dataset = Dataset.get( dataset_project="palmer penguins", dataset_name="raw palmer penguins", alias="my_test_alias_name", overridable=True)That was the only time I called the get() command. I still got the message that I should specify the alias. I can try and do a bit of debugging to see why it gets called multiple times.

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

The point of the alias is for better visibility in the Experiment Manager. Check the screenshots above for what it looks like in the UI. Essentially, setting an Alias makes sure the task that is getting the dataset automatically logs the ID that it gets using Dataset.get() . The reason being that if you later on look back to your experiment, you can also see what dataset was .get() 't back then.

ExuberantBat52 When you still get the log messages, where did you specify the alias? Because if that's true, we might have a bug on our hands 🧐

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Nice find! I'll pass it through to the relevant devs, we'll fix that right up 🙂 Is there any feedback you have on the functionality specifically? aka, would you use alias give what you know now or would you e.g. name it differently?

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Right so I figured out why it was calling it multiple times. Everytime a dataset is serialiazed, it calls the _serialize() function inside of clearml/datasets/dataset.py file, the _serialize() method calls self.get(parent_dataset_id) which is the same get() method. This means that the user will always be prompted with the log, even if they are not "getting" a dataset. So anytime a user creates, uploads, finalizes a dataset, they will be prompted with the message above.

Hope that helped ExasperatedCrab78

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExuberantBat52
				
					0
					 × 1

Hello ExasperatedCrab78 ! I tried to add the PR. I hope it will be sufficient and clear. Have a nice day and thanks for help ! :)

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrocodile76
				
					0
					 × 1

Also, please note that since the video has been uploaded, the dataset UI has changed. So now a dataset will be found under the dataset tab on the left instead of in the experiment manager 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Write your answer

2K Views

25 Answers

2 years ago