Hi Clearmlers, I'M Trying To Create A Dataset With Tagged Batches Of Data. I Firstly Create An Empty Dataset With Dataset_Name = 'Name_Dataset', And Then Create A Another Tagged Dataset With The First Batch And With Parent_Datasets=['Name_Dataset']. It'S

Answered

Hi CLEARMLers, I'm trying to create a dataset with tagged batches of data. I firstly create an empty dataset with dataset_name = 'name_dataset', and then create a another tagged dataset with the first batch and with parent_datasets=['name_dataset']. It's not working for me that way, any suggestion? Can anybody send to me an example?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

Votes Newest

Answers 9

Hi @<1668427950573228032:profile|TeenyShells80> , the parent_datasets should be a list of dataset IDs or clearml.Dataset objects, not dataset names. Maybe that is the issue

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Hi @<1668427950573228032:profile|TeenyShells80> , can you please elaborate on the process? Exactly what steps you took, what CLI commands. Also what is happening when you say it's not working? Are there console logs? Please add some information 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Traceback (most recent call last):
File "/root/ehread-playgrounds/bbiescas/NER-ES/clearml_pipelines/./step_1_clearml_dataset.py", line 38, in <module>
dataset = Dataset.create(dataset_name="general_ner_es",
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 1248, in create
parent_datasets = [cls.get(dataset_id=p) if not isinstance(p, Dataset) else p for p in (parent_datasets or [])]
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 1248, in <listcomp>
parent_datasets = [cls.get(dataset_id=p) if not isinstance(p, Dataset) else p for p in (parent_datasets or [])]
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 1779, in get
instance = get_instance(dataset_id)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 1678, in get_instance
task = Task.get_task(task_id=dataset_id_)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/task.py", line 989, in get_task
return cls.__get_task(
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/task.py", line 4331, in __get_task
return cls(private=cls.__create_protection, task_id=task_id, log_to_backend=False)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/task.py", line 209, in init
super(Task, self).init(**kwargs)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 161, in init
super(Task, self).init(id=task_id, session=session, log=log)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/backend_interface/base.py", line 152, in init
self.id = self.normalize_id(id)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/backend_interface/base.py", line 187, in normalize_id
return id.strip() if id else None
AttributeError: 'list' object has no attribute 'strip'

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

thanks a lot Eugen, that was the issue

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

this is the error I get

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

Please add it as a code snippet.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I'm working in an interactive session

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

batches = [batch_1, batch_2, batch_3]

if name == 'main':
print("Create the dataset in ClearML")
dataset = Dataset.create(dataset_name="general_ner_es",
dataset_project='general_ner',
output_uri=' None ')

for batch in batches:

    df = pandasDF_from_annotations(bucket_name, batch_1)
     [df.to](http://df.to) _pickle('df.pkl')
    print("Add files to the dataset from " + str(batch))
    dataset = Dataset.create(dataset_name="general_ner_es",
                    dataset_project='general_ner',
                    output_uri=' [None](s3://es-ehrd-production-s3-ml-development/clearml/datasets/labelstudio/general_ner_es) ',
                    dataset_tags = [str(batch)],
                    parent_datasets=["general_ner_es"])

  				
Posted 
	one year ago

					More
				  		
  Report
		
					TeenyShells80
				
					0
					 × 1

Can you add a code snippet that reproduces this for you please?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

1K Views

9 Answers

one year ago