Hi Guys! What Is The Best Way To Access Artifacts From Other Step Of The Pipeline? I Have Step One Returning Dataframe And Step Two Takes It As An Input But When First Step Is Cached I Only Get An Artifact Url. So How Should I Read It From Artifacts Stora

Answered

Hi guys! What is the best way to access artifacts from other step of the pipeline? I have step one returning dataframe and step two takes it as an input but when first step is cached I only get an artifact url. So how should I read it from artifacts storage?

  				
Posted 
	2 years ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Votes Newest

Answers 25

http://<host>:8081/lp_veh_detect_train_pipeline/.pipelines/vids_pipe/detect_frames.4a80b274007d4e969b71dd03c69d504c/artifacts/videos_df/videos_df.csv.gz
(the <host> contains correct hostname)

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Hmm, can you send the full log of the pipeline component that failed, because this should have worked
Also could you test it with the latest clearml python version (i.e. 1.10.2)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

QuizzicalFox36 , are you running the steps from the machine who's config you checked?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Here is a log of the pipeline

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

AgitatedDove14 I could test it but I just recently fixed this issue by caching the previous step where this artifact is coming from. Now I'm getting the dataframe itself instead of link to artifact.
I don't know should I waste our time on this? However, it's very interesting why ability to cache the step impacts artifacts behavior

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Hi QuizzicalFox36 ,
You can use StorageManager.download_file() to easily fetch files.
None

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

However, it's very interesting why ability to cache the step impacts artifacts behavior

From you log:

videos_df = StorageManager.download_file(videos_df)

Seems like "videos_df" is the DataFrame, why are you trying to download the DataFrame ? I would expect to try and download the pandas file, not a DataFrame object

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Hi, I have no idea as I don't upload the file to the artifacts myself. I return df from function that is in previous pipeline step and then pass it as a parameter for this step

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

I suggest reading all of them, starting with pipeline from tasks 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

QuizzicalFox36 , yes 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 Hi! Sorry for not responding for a long time. I couldn't reproduce my issue until today. So usually, I don't need to use Storage manager as I would get the contents of the parameter directly. However, for some reason once again instead of contents of dataframe I've got a PosixPath to the artifact or string describing dict with url to artifact. I've implemented if statement to catch such cases but again I can't access the artifact. I've tried StorageManager.download_file(str(artifacts_PosixPath)) and got same path as response or when accessing as string StorageManager.download_file(artifact_as_string_dict)) (I would get it if I pass .artifacts.df), got error ValueError: Requested path does not exist: /home/monika_kazlauskaite/.clearml/venvs-builds.1/3.8/code/{'name': 'videos_df', 'size': 1357, 'type': 'pandas', 'mode': <ArtifactModeEnum.output: 'output'>, 'url': 'http:/11.11.11.11:8081/lp_veh_detect_train_pipeline/.pipelines/vids_pipe/detect_frames.4a80b274007d4e969b71dd03c69d504c/artifacts/videos_df/videos_df.csv.gz',

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

None

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 I've checked my configs and it's all good, no / is missing

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

AgitatedDove14 it is as expected - a dataframe

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

is it in clearml.conf api.files_server?

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

What is the link you are seeing there?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I think your "files_server" is misconfigured somewhere, I cannot explain how you ended up with this broken link...
Check the clearml.conf on the machines or the env vars ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What do you have in the artifacts of this task id: 4a80b274007d4e969b71dd03c69d504c

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yes. I've also checked configs in all of my machines just in case

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

(you can find it in the pipeline component page)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

here's a log of the failing step

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Where should I configure it?

  				
Posted 
	one year ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Hi QuizzicalFox36

http:/34.67.35.46:8081/...

notice there is a / missing in the link, how is that possible? it should be http://

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

You can take a look at the pipeline examples here:
None
Transferring artifacts between tasks is exactly what they do.

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

I didn't understand how to use it. Everything I've tried failed. Could you give me an example?

  				
Posted 
	2 years ago

					More  		
  Report
		
					QuizzicalFox36
				
					0
					 × 1

Write your answer

1K Views

25 Answers

2 years ago

one year ago