Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello. I Want To Update An Artifact In A Task (A Pandas Data Frame). I Do This With

Hello. I want to update an artifact in a task (a pandas data frame). i do this with upload_artifact . When i access the artifact the next time, i get always the first version of the artifact. This is due to the cache, when i delete locally my cache ~/.clearml/cache i get the latest artifact next time. How properly update and get the next time the updated artifact from a task ?

  
  
Posted one year ago
Votes Newest

Answers 6


Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?

  
  
Posted one year ago

That is my workflow, code to reproduce:

def test_something():
    # 1. Create a new task
    task = Task.create(
        project_name="Playground",
        task_name="test",
    )
    # 2. Create a new pandas data frame and upload as artifact
    test_df = pd.DataFrame(
        {
            "col1": [1],
            "col2": [2],
        }
    )
    task.upload_artifact("test_df", test_df)
    task_id = task.task_id
    task.close()

    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)

    # 3. Download the pandas dataframe
    downloaded_df = task.artifacts["test_df"].get()

    # 4. Add a new row to the data frame and upload again with the same name (doc says this is then an update)
    new_row = {
        "col1": 3,
        "col2": 4,
    }
    updated_df = downloaded_df.append(new_row, ignore_index=True)
    task.upload_artifact("test_df", updated_df)
    task.close()
    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)


    # 5. Download the pandas dataframe again -> it still has one row (loaded first version from cache !)
    downloaded_df = task.artifacts["test_df"].get()
  
  
Posted one year ago

Glad I could be of help

  
  
Posted one year ago

yes thx a lot, it works ! i have not seen the parameter, and the Task.init is correct…

  
  
Posted one year ago

Also, make sure you use Task.init instead of task.init

  
  
Posted one year ago

You can try to add the force_download=True flag to .get() to ignore the locally cached content. Let me know if it helps.

  
  
Posted one year ago
1K Views
6 Answers
one year ago
one year ago
Tags