Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone ! I Tried To Reproduce Your Tutorial :

Hello everyone ! I tried to reproduce your tutorial : https://github.com/thepycoder/urbansounds8k . Basically I got still the same issue: 2022-11-21 09:57:31,573 - clearml - INFO - Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server. If I understood documentation I do not have to specify alias when I do have dataset_project and dataset_name defined.

  
  
Posted 2 years ago
Votes Newest

Answers 25


Hi ExasperatedCrocodile76 , I think you are correct. This is simply an info print out.

  
  
Posted 2 years ago

This ^
If you're not getting any errors, it should work just fine 🙂

In https://github.com/thepycoder/urbansounds8k/blob/main/preprocessing.py i'm seeing dataset_task.get_logger().report_image , dataset_task.get_logger().report_table , dataset_task.get_logger().report_histogram and dataset_task.get_logger().report_media which are all manual loggings. Hence, why the author probably didn't use any automatic logging.

  
  
Posted 2 years ago

Anyways no histogram / table is reported in dashboard however in tutorial video @ https://www.youtube.com/watch?v=quSGXvuK1IM it is.

  
  
Posted 2 years ago

ExasperatedCrab78 do you know how this could be?

  
  
Posted 2 years ago

Hi ExasperatedCrocodile76

In terms of the Alias, yes, you don't need to specify one. Recently we changed things that if you do add an alias, the dataset alias and ID will automatically be added in the experiment manager. So it's more a print that says: "hey, you might want to try this!"

Second, this is my own version of the example that's no longer maintained. The official version is here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k
We changed the link in that video description as soon as it was moved, but it seems you were one of the early viewers 😄

Third, could you try to run the example from this new url/repository? There were some fixes in there since launch. If it still doesn't work, let me know and I'll run the example myself to try and reproduce 🙂

  
  
Posted 2 years ago

Hi ExasperatedCrab78 ,
am I getting it right that alias = dataset id which can be found in the clearml dashboard?

I also tried the changed url (github) you sent. I can successfully run all the scripts, but I do not see any results from log_dataset_statistics . Maybe I am wrong but in the https://github.com/allegroai/clearml-blogs/blob/master/urbansounds8k/preprocessing.py this function is not even called. Could you help me with that please? I just want to replicate -> clearml dashboard -> preprocessing -> results -> plots. How to correctly call it, Thanks.

  
  
Posted 2 years ago

am I getting it right that alias = dataset id which can be found in the clearml dashboard?

Not really. It's Alias + dataset_id that will be found in the clearml dashboard 🙂 I'm attaching a screenshot of what that looks like in both the code and the dashboard

  
  
Posted 2 years ago

Oh thanks ! 🙂 I understand now. Please let me know about the other problem.

  
  
Posted 2 years ago

And it seems you are fully correct, log_dataset_statistics is not even called in preprocessing.py 🤔 That seems like a bad oversight from our end. If you add it yourself, does it work then? Add it on line 140, just after dataset_task.flush the line itself is: self.log_dataset_statistics(dataset_task) also change line 84 to values=self.metadata['label'],
You found it, so if you want you can open a PR and fix it so you get credit for it 🙂 If not, I can do it if you want!

  
  
Posted 2 years ago

Also, please note that since the video has been uploaded, the dataset UI has changed. So now a dataset will be found under the dataset tab on the left instead of in the experiment manager 🙂

  
  
Posted 2 years ago

ExasperatedCrocodile76 I have ran the example and even with my fix it is not ideal. The sample is relatively old now, so I'll revise it asap. Thank you for noticing it!

  
  
Posted 2 years ago

Hello ExasperatedCrab78 ! I tried to add the PR. I hope it will be sufficient and clear. Have a nice day and thanks for help ! :)

  
  
Posted 2 years ago

Thank you so much ExasperatedCrocodile76 , I'll check it tomorrow 🙂

  
  
Posted 2 years ago

So what's the point of the alias? It's not very clear.. Even after specifying an alias I am still getting the following message: Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server

  
  
Posted 2 years ago

The point of the alias is for better visibility in the Experiment Manager. Check the screenshots above for what it looks like in the UI. Essentially, setting an Alias makes sure the task that is getting the dataset automatically logs the ID that it gets using Dataset.get() . The reason being that if you later on look back to your experiment, you can also see what dataset was .get() 't back then.

ExuberantBat52 When you still get the log messages, where did you specify the alias? Because if that's true, we might have a bug on our hands 🧐

  
  
Posted 2 years ago

Thanks for the reply. I was trying out this feature on a dummy example. I used the following command
dataset = Dataset.get( dataset_project="palmer penguins", dataset_name="raw palmer penguins", alias="my_test_alias_name", overridable=True)That was the only time I called the get() command. I still got the message that I should specify the alias. I can try and do a bit of debugging to see why it gets called multiple times.

  
  
Posted 2 years ago

Right so I figured out why it was calling it multiple times. Everytime a dataset is serialiazed, it calls the _serialize() function inside of clearml/datasets/dataset.py file, the _serialize() method calls self.get(parent_dataset_id) which is the same get() method. This means that the user will always be prompted with the log, even if they are not "getting" a dataset. So anytime a user creates, uploads, finalizes a dataset, they will be prompted with the message above.

Hope that helped ExasperatedCrab78

  
  
Posted 2 years ago

Nice find! I'll pass it through to the relevant devs, we'll fix that right up 🙂 Is there any feedback you have on the functionality specifically? aka, would you use alias give what you know now or would you e.g. name it differently?

  
  
Posted 2 years ago

I would like to see it used in a clear example as it was intended to be used before giving my opinion on it, if that makes sense

  
  
Posted one year ago

Hey ExasperatedCrocodile76 and ExuberantBat52
Thanks for your contributions, I've updated the example here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k

  
  
Posted one year ago

Wow, that was fast. Thanks a lot for your prompt response! Will check it out now :D

  
  
Posted one year ago

Please do, if you find any more issues (due to my shitty code or otherwise 😄 ) let me know and I'll fix 'em!

  
  
Posted one year ago

ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs 😄

  
  
Posted one year ago

This update was just to modernize the example itself 🙂

  
  
Posted one year ago

I knew that, I was just happy that we have an updated example 😁

  
  
Posted one year ago
1K Views
25 Answers
2 years ago
one year ago
Tags