Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, In The Following Context:

Hello, in the following context:
` controller_task = Task.init(...)

This will clone the parent task, enqueue and wait for finished status

data_processing_task = schedule_task(parent=controller_task, wait=True, ...)

Now retrieving the data processed by the data_processing_task

data_processing_task.artifacts["data_processed"].get() This gives me: KeyError: 'data_processed' error So I guess data_processing_task didn't have the chance to refresh in its internal state the available artifacts. Several questions: How should I do to make sure it refreshes it? Shouldn't the task.artifacts[] ` always try to fetch from server to always make sure to have the latest state of the artifacts for one task?

  
  
Posted 4 years ago
Votes Newest

Answers 13


My bad I wrote refresh and then edited it to the correct "reload" 😞

  
  
Posted 4 years ago

Hi JitteryCoyote63
If you want to refresh the task object, call task.reload() It will also refresh the artifacts.
The reason for not always do so when accessing the .artifacts objects is for speed optimization (It might be slow compared to dict access, and we assume users will expect it to behave the dict)

  
  
Posted 4 years ago

Downloading the artifacts is done only when actually calling get()/get_local_copy()

Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts 🙂

  
  
Posted 4 years ago

That said, you might have accessed the artifacts before any of them were registered

I called task.wait_for_status() to make sure the task is done

  
  
Posted 4 years ago

PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.

  
  
Posted 4 years ago

Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...

  
  
Posted 4 years ago

I called task.wait_for_status() to make sure the task is done

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

  
  
Posted 4 years ago

task.wait_for_status() task.reload() task.artifacts["output"].get()

  
  
Posted 4 years ago

That said, you might have accessed the artifacts before any of them were registered

  
  
Posted 4 years ago

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

That sounds awesome! It will definitely fix my problem 🙂

In the meantime: I now do:
task.wait_for_status() task._artifacts_manager.flush() task.artifacts["output"].get()But I still get KeyError: 'output' ... Was that normal? Will it work if I replace the second line with task.refresh () ?

  
  
Posted 4 years ago

Thanks AgitatedDove14 !
Could we add this task.refresh() on the docs? Might be helpful for other users as well 🙂 OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

  
  
Posted 4 years ago

: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()

  
  
Posted 4 years ago

Looking at the source code, it seems like I should do:
data_processing_task._artifact_manager.flush() to make sure to have the latest version of artifacts in the task, right?

  
  
Posted 4 years ago