Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

Answered

Hello, I am trying to retrieve a simple dict artifact uploaded in a previous task with task.upload_artifact("my_dict", dict(foo="bar")) in a second task. I tried:
previous_task = Task.get_task(task.parent) my_dict = previous_task.get_registered_artifacts().get("my_dict")But this gives me an empty json. How should I do that?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 17

So get_registered_artifacts() only works for dynamic artifacts right? I am looking for a download_artifacts() which allows me to retrieve static artifacts of a Task

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Ho the object is actually available in previous_task.artifacts

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...

Retrieving an artifact from a Task is done by:
Task.get_task(task_id='aaa').artifacts['foot'].get()
or if you want the file itself and not the object:
Task.get_task(task_id='aaa').artifacts['foot'].get_local_copy()

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

awesome! Unfortunately, calling artifact["foo"].get() gave me:
Could not retrieve a local copy of artifact foo, failed downloading file:///checkpoints/test_task/test_2.fgjeo3b9f5b44ca193a68011c62841bf/artifacts/foo/foo.json
It tries to get it from the local storage, but the json is stored in s3 (it does exists) and I did create both tasks specifying the correct output_uri (to s3)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Oops, I spoke to fast, the json is actually not saved in s3

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

So previous_task actually ignored the output_uri

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

and saved locally, which is why the second task, not executed in the same machine, cannot access the file

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

So when I create a task using `task = Task.init(project_name=config.get("project_name"), task_name=config.get("task_name"), task_type=Task.TaskTypes.training, output_uri=" s3://my-bucket ") locally, the artifact is correctly logged remotely, but when I create the task remotely (from an agent) the artifact is logged locally (in the agent machine, not on s3)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

It seems that around here, a Task that is created using init remotely in the main process gets its output_uri parameter ignored

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

nvm, bug might be from my side. I will open an issue if I find any easy reproducible example

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

JitteryCoyote63 okay... but let me explain a bit so you get a better intuition for next time 🙂
The Task.init call, when running remotely, assumes the Task object already exists in the backend, so it ignores whatever was in the code and uses the data stored on the trains-server, similar to what's happening with Task.connect and the argparser.
This gives you the option of adding/changing the "output_uri" for any Task regardless of the code. In the Execution tab, change the "Output Destination" this will be the same as changing the "output_uri" , so even if you did not provide it in the original experiment, you can run it remotely and have your artifact uploaded to your S3.

Make sense ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes, thanks! In my case, I was actually using TrainsSaver from pytorch-ignite with a local path, then I understood looking at the code that under the hood it actually changed the output_uri of the current task, thats why my previous_task.output_uri = " s3://my_bucket " had no effect (it was placed BEFORE the training)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Setting it after the training correctly updated the task and I was able to store artifacts remotely

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

thanks for your help!

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Great, looking forward!

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ElegantKangaroo44
				
					0
					 × 1

Write your answer

2K Views

17 Answers

5 years ago

2 years ago