Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is There An Elegant Way Of Accessing A Specific File Entry From A Dataset Without Using Io Operations To Locate The File From The Cache Folder? The File Is Intended To Be Used To Create A Dataframe. At The Moment I'M Using The Code Below. The Problem Is T

Is there an elegant way of accessing a specific file entry from a dataset without using IO operations to locate the file from the cache folder? The file is intended to be used to create a dataframe. At the moment I'm using the code below. The problem is that as files are added to the dataset, the index of the target file changes and the logic has to be adjusted. Perhaps this is possible by logging a dataframe artifact and using the storage manager to retrieve the artifact?

dataset = Dataset.get(
  dataset_project="Project",
  dataset_name="Dataset name",
  alias="something"
  ).get_local_copy()

file_for_df = os.path.join(dataset, os.listdir(dataset)[2])
  
  
Posted 9 months ago
Votes Newest

Answers 6


Thanks. @<1584716355783888896:profile|CornyHedgehog13> , I considered this. is the chunk order deterministic? As in, can I rely on chunk [0] always referring to the same file object if additional files are added?

  
  
Posted 9 months ago

I found this... It works as long as the initial data files uploaded are converted to csv files (e.g., excel, .sav, .spss etc).

preprocess_task = Task.get_task(task_id='xxx123')
local_csv = preprocess_task.artifacts['data'].get_local_copy()
  
  
Posted 9 months ago

It is deterministic. When you do Dataset.get(), clearML downloads file state.json, where you can see all relative file paths and chunks number

  
  
Posted 9 months ago

You can get a chunk number that contains your file and download that chunk

  
  
Posted 9 months ago

There is no natural way to expose single files in Datasets. However it looks like you found an appropriate workaround 🙂

  
  
Posted 9 months ago

Thanks!

  
  
Posted 9 months ago