Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, It Seems After Sync Command, Finalize Is Not Working: Please Let Me Know If I Am Missing Anything.

Hi All,
It seems after sync command, finalize is not working:

Please let me know if I am missing anything.
op = dso.sync_folder(local_path="./data",verbose=True)
dso.finalize()

I always get this error:
raise ValueError("Cannot finalize dataset, status '{}' is not valid".format(status))
ValueError: Cannot finalize dataset, status 'completed' is not valid

I was referring this video: None
please see the pic.
image

  
  
Posted 3 months ago
Votes Newest

Answers 4


Hi @<1720249421582569472:profile|NonchalantSeaanemone34> , can you please provide a full log of a run? Also do you have a full snippet that reproduces this behaviour?

  
  
Posted 3 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36>
here is the full code:

import os
import sys,shutil
import clearml
from clearml import Task, Dataset, Logger
from clearml import PipelineDecorator, PipelineController

project_name = "Titanic Project"
dataset_name = "titanic_data"

datasets = Dataset.list_datasets()
for dataset in datasets:
if dataset["project"] == project_name and dataset["name"] == dataset_name:
parent_datasets_id = dataset["id"]

print(parent_datasets_id)

dso = Dataset.create(
dataset_project= project_name,
dataset_name= dataset_name,
parent_datasets=[parent_datasets_id],
)

dso = Dataset.get(
dataset_project= project_name,
dataset_name= dataset_name,
only_completed=True,
only_published=False,
alias='latest',
)

if os.path.exists("./data"):
shutil.rmtree("./data")

local_path = dso.get_mutable_local_copy("./data")

print(local_path)

with open("./data/titanic.csv","a+") as fh:
fh.write('\n885,0,3,"Sutehaasll, Mr. Henry Jr.",male,45,0,0,SOTON/OQ 392076,7.05,,S\n')

op = dso.sync_folder(local_path="./data",verbose=True)
print(op)
dso.finalize(auto_upload=True,verbose=True)

  
  
Posted 3 months ago

Hi @<1523701205467926528:profile|AgitatedDove14>
Thanks alot for pointing out "dso" variable 🙂 . I did not realize that same name i was using for create and get dataset function.
It was my mistake, i changed the variable and everything is functional as expected.

Thank you again.

  
  
Posted 3 months ago

@<1720249421582569472:profile|NonchalantSeaanemone34>

dso = Dataset.create(
        dataset_project= project_name,
        dataset_name= dataset_name,
        parent_datasets=[parent_datasets_id],
)
dso = Dataset.get(
        dataset_project= project_name,
        dataset_name= dataset_name,
        only_completed=True,
        only_published=False,
        alias='latest',
)

why are you creating a dataset then getting a dataset on the same object?
it seems you are trying to upload to the existing dataset and not the newly created one notice in both cases dso is the variable name

  
  
Posted 3 months ago
301 Views
4 Answers
3 months ago
3 months ago
Tags