KindChimpanzee37 , this time, I was away for a week 🙂 . I do not think, that I made the mistake you suggested. At the top of the script I wroteproject_name = 'RL/Urbansounds'
and then later
` self.original_dataset = Dataset.get(dataset_project=project_name, dataset_name='UrbanSounds example')
This will return the pandas dataframe we added in the previous task
self.metadata = Task.get_task(task_id=self.original_dataset.id).artifacts['metadata'].get() `
VivaciousBadger56 hope you had a great time while away :)
That looks correct indeed. Do you mind checking for me if the dataset actually contains the correct metadata?
Go to the datasets section, select the one you need and on the right click on more information. It should send you to the experiment manager view. Then, under artifacts, do you see a key in the list named metadata? Can you post a screenshot?
Hi VivaciousBadger56 , can you add the full error here?
Maybe ExasperatedCrab78 , might have an idea
KindChimpanzee37 , I ensured that the dataset_name is the same in get_data.py and preprocessing.py and that seemed to help. Then, I got the error RuntimeError: No audio I/O backend is available.
, because of which I installed PySoundFile
with pip; that helped. Weirdly enough then, the old id error came back. So, I re-ran get_data.py and then preprocessing.py - this time the id error was gone again. Instead, I got raise TypeError("Invalid file: {0!r}".format(self.name)) TypeError: Invalid file:
instead. Going with https://github.com/librosa/librosa/issues/1236#issuecomment-691113323 I got that I need a newer version and because of other posts installed soundfile.
Anyway, I re-ran get_data.py and then preprocessing.py again and now it seems to work.
Takeaway:
Maybe soundfile should be added or updated in the respective script that installs venv?
Further question:
Why do I need to run get_data.py every time right before I run preprocessing.py ? That is very inefficient during debugging. What am I doing wrong here?
When I run the example this way, everything seems to be working.
CostlyOstrich36 sure:[..]\urbansounds8k\venv\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") ClearML Task: overwriting (reusing) task id=[..] 2022-09-14 14:40:16,484 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page:
`
Traceback (most recent call last):
File "[..]\urbansounds8k\preprocessing.py", line 145, in <module>
datasetbuilder = DataSetBuilder()
File "[..]\urbansounds8k\preprocessing.py", line 68, in init
self.metadata = Task.get_task(task_id=self.original_dataset.id).artifacts['metadata'].get()
KeyError: 'metadata'
Process finished with exit code 1 `
KindChimpanzee37 : First I went to the dataset and clicked on "Task information ->" in the right bottom corner of the "VERSION INFO". I supposed that is the same as what you meant with "right click on more information"? Because I did not find any option to "right click on more information". The "Task information ->" leads me to a view in the experiment manager. I posted the two screen shots.
PS: It is weird to me that the datamanager leads me to the experiment manager, specifically an experiment that is not available in the experiment manager when I just click on it in the main menu. That is not intuitive at least for me right now, having little experience with ClearML.
VivaciousBadger56 Thanks for your patience, I was away for a week 🙂 Can you check that you properly changed the project name in the line above the one you posted?
In the example, by default, the project name is "ClearML Examples/Urbansounds". But it should give you an error when first running the get_data.py
script that you can't actually modify that project (by design). You need to change it to one of you own choice. You might have done that in get_data.py
but forgot to do so in preprocessing.py
. If this is the case, just do a project-wide search/replace for ClearML examples/Urbansounds
and replace it with whatever name you want 🙂
VivaciousBadger56 Thank you for the screenshots! I appreciate the effort. You indeed clicked on the right link, I was on mobile so had to instruct from memory 🙂
First of all: every 'object' in the ClearML ecosystem is a task. Experiments are tasks, so are dataset versions and even pipelines! Each task can be viewed using the experiment manager UI, that's just how the backend is structured. Of course we keep experiments and data separate by giving them a separate tab and different UI, but the underlying building blocks are the same. It makes it so you can easily access metadata in both datasets as in experiments using the same API 🙂
That said, it seems that isn't the case with this example for some reason. In the second screenshot on the left, we can clearly see that the 'metadata' is correctly stored in the dataset task. So the next thing I can think of is to check and make sure the system is trying to pull the right task.
Can you make sure that the self.original_dataset.id
in this snippet:
` self.original_dataset = Dataset.get(dataset_project=project_name, dataset_name='UrbanSounds example')
This will return the pandas dataframe we added in the previous task
self.metadata = Task.get_task(task_id=self.original_dataset.id).artifacts['metadata'].get() is the same ID as the one in the UI? (Check nr 1 in screenshot) You can add a simple
print(self.original_dataset.id) ` just to make sure.
I'm thinking this because in the snippet, you ask ClearML to get Dataset.get(dataset_project=project_name, dataset_name='UrbanSounds example')
the dataset with name UrbanSounds example
, but in the screenshot, you seem to have added something else to the dataset name that you erased with black (nr 2 on screenshot). Which would mean that ClearML will nog get this dataset and so cannot find the metadata.
Can you check this?