This will clone the parent task, enqueue and wait for finished status

data_processing_task = schedule_task(parent=controller_task, wait=True, ...)

Now retrieving the data processed by the data_processing_task

data_processing_task.artifacts["data_processed"].get() This gives me: KeyError: 'data_processed' error So I guess data_processing_task didn't have the chance to refresh in its internal state the available artifacts. Several questions: How should I do to make sure it refreshes it? Shouldn't the task.artifacts[] ` always try to fetch from server to always make sure to have the latest state of the artifacts for one task?

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 13

PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Looking at the source code, it seems like I should do:
data_processing_task._artifact_manager.flush() to make sure to have the latest version of artifacts in the task, right?

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I called task.wait_for_status() to make sure the task is done

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

My bad I wrote refresh and then edited it to the correct "reload" 😞

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks AgitatedDove14 !
Could we add this task.refresh() on the docs? Might be helpful for other users as well 🙂 OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

That said, you might have accessed the artifacts before any of them were registered

I called task.wait_for_status() to make sure the task is done

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Hi JitteryCoyote63
If you want to refresh the task object, call task.reload() It will also refresh the artifacts.
The reason for not always do so when accessing the .artifacts objects is for speed optimization (It might be slow compared to dict access, and we assume users will expect it to behave the dict)

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Downloading the artifacts is done only when actually calling get()/get_local_copy()

Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts 🙂

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

That said, you might have accessed the artifacts before any of them were registered

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

task.wait_for_status() task.reload() task.artifacts["output"].get()

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object

That sounds awesome! It will definitely fix my problem 🙂

In the meantime: I now do:
task.wait_for_status() task._artifacts_manager.flush() task.artifacts["output"].get()But I still get KeyError: 'output' ... Was that normal? Will it work if I replace the second line with task.refresh () ?

  				
Posted 
	4 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Write your answer

1K Views

13 Answers

4 years ago

2 years ago