Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello. I Want To Update An Artifact In A Task (A Pandas Data Frame). I Do This With

Hello. I want to update an artifact in a task (a pandas data frame). i do this with upload_artifact . When i access the artifact the next time, i get always the first version of the artifact. This is due to the cache, when i delete locally my cache ~/.clearml/cache i get the latest artifact next time. How properly update and get the next time the updated artifact from a task ?

  
  
Posted one year ago
Votes Newest

Answers 6


Glad I could be of help

  
  
Posted one year ago

That is my workflow, code to reproduce:

def test_something():
    # 1. Create a new task
    task = Task.create(
        project_name="Playground",
        task_name="test",
    )
    # 2. Create a new pandas data frame and upload as artifact
    test_df = pd.DataFrame(
        {
            "col1": [1],
            "col2": [2],
        }
    )
    task.upload_artifact("test_df", test_df)
    task_id = task.task_id
    task.close()

    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)

    # 3. Download the pandas dataframe
    downloaded_df = task.artifacts["test_df"].get()

    # 4. Add a new row to the data frame and upload again with the same name (doc says this is then an update)
    new_row = {
        "col1": 3,
        "col2": 4,
    }
    updated_df = downloaded_df.append(new_row, ignore_index=True)
    task.upload_artifact("test_df", updated_df)
    task.close()
    task = task.init(
        project_name="Playground",
        task_name="test",
        reuse_last_task_id=task_id,
        continue_last_task=True)


    # 5. Download the pandas dataframe again -> it still has one row (loaded first version from cache !)
    downloaded_df = task.artifacts["test_df"].get()
  
  
Posted one year ago

You can try to add the force_download=True flag to .get() to ignore the locally cached content. Let me know if it helps.

  
  
Posted one year ago

yes thx a lot, it works ! i have not seen the parameter, and the Task.init is correct…

  
  
Posted one year ago

Also, make sure you use Task.init instead of task.init

  
  
Posted one year ago

Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?

  
  
Posted one year ago
614 Views
6 Answers
one year ago
one year ago
Tags