Hello. I Want To Update An Artifact In A Task (A Pandas Data Frame). I Do This With

Answered

Hello. I want to update an artifact in a task (a pandas data frame). i do this with upload_artifact . When i access the artifact the next time, i get always the first version of the artifact. This is due to the cache, when i delete locally my cache ~/.clearml/cache i get the latest artifact next time. How properly update and get the next time the updated artifact from a task ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					ThickSnake12
				
					0
					 × 1

Votes Newest

Answers 6

You can try to add the force_download=True flag to .get() to ignore the locally cached content. Let me know if it helps.

  				
Posted 
	2 years ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Hey ThickSnake12 , how exactly do you access the artifact next time? Can you provide a code sample?

  				
Posted 
	2 years ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

That is my workflow, code to reproduce:

def test_something():
    # 1. Create a new task
    task = Task.create(
        project_name="Playground",
        task_name="test",
    )
    # 2. Create a new pandas data frame and upload as artifact
    test_df = pd.DataFrame(
        {
            "col1": [1],
            "col2": [2],
        }
    )
    task.upload_artifact("test_df", test_df)
    task_id = task.task_id
    task.close()

    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)

    # 3. Download the pandas dataframe
    downloaded_df = task.artifacts["test_df"].get()

    # 4. Add a new row to the data frame and upload again with the same name (doc says this is then an update)
    new_row = {
        "col1": 3,
        "col2": 4,
    }
    updated_df = downloaded_df.append(new_row, ignore_index=True)
    task.upload_artifact("test_df", updated_df)
    task.close()
    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)


    # 5. Download the pandas dataframe again -> it still has one row (loaded first version from cache !)
    downloaded_df = task.artifacts["test_df"].get()

  				
Posted 
	2 years ago

					More  		
  Report
		
					ThickSnake12
				
					0
					 × 1

Also, make sure you use Task.init instead of task.init

  				
Posted 
	2 years ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

Glad I could be of help

  				
Posted 
	2 years ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

yes thx a lot, it works ! i have not seen the parameter, and the Task.init is correct…

  				
Posted 
	2 years ago

					More  		
  Report
		
					ThickSnake12
				
					0
					 × 1

Write your answer

1K Views

6 Answers

2 years ago