Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Folks, One Question: I Have A Script That Looks Like:

Hi folks, one question:

I have a script that looks like:

` import clearml as cml
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix, accuracy_score
import joblib

task = cml.Task.init(
project_name="Iris",
task_name="Report",
task_type=cml.TaskTypes.inference
)

dataset_task = cml.Task.get_task(
project_name='Iris',
task_name='Dataset_Generation',
task_filter={"status": ["completed"]}
)

train = dataset_task.artifacts["training_set"].get()
test = dataset_task.artifacts["test_set"].get()
train_target = train.loc[:, "Species"]
train = train.drop(columns=["Species"])
test_target = test.loc[:,"Species"]
test = test.drop(columns=["Species"])

...

rest of the code `
When I execute it locally, everything works, however when I clone it and execute it remotely changing some parameter, the experiment fails with this error:

2022-09-05 15:28:53,427 - clearml.util - WARNING - Selected taskDataset_Generation(id=f6ebdbf830474cb3af0c040ce412c316) Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.9/task_repository/clearml-demo.git/realistic-example/02-model_training.py", line 25, in <module> train_target = train.loc[:, "Species"] AttributeError: 'PosixPath' object has no attribute 'loc' 2022-09-05 17:29:09 Process failed, exit code 1It looks like the "get()" method doesn't retrieve the artifact as I would have expected, rather it returns a path to it....

Is this expected, and what would be the best way to retrieve the artifact from the fileserver then?

  
  
Posted 2 years ago
Votes Newest

Answers 19


Hi SarcasticSquirrel56 , can you print out the contents of train and see what you get? Is that a path to the actual downloaded artifact?

  
  
Posted 2 years ago

sure, give me a couple of minutes to make the changes

  
  
Posted 2 years ago

actually there are some network issues right now, I'll share the output as soon as I manage to run it

  
  
Posted 2 years ago

OK, so... when executed locally "train" prints:
train: SepalLength SepalWidth PetalLength PetalWidth Species 122 7.7 2.8 6.7 2.0 2.0 86 6.7 3.1 4.7 1.5 1.0 59 5.2 2.7 3.9 1.4 1.0 4 5.0 3.6 1.4 0.2 0.0 77 6.7 3.0 5.0 1.7 1.0 .. ... ... ... ... ... 57 4.9 2.4 3.3 1.0 1.0 45 4.8 3.0 1.4 0.3 0.0 55 5.7 2.8 4.5 1.3 1.0 140 6.7 3.1 5.6 2.4 2.0 38 4.4 3.0 1.3 0.2 0.0in a cloned experiment:
train: /root/.clearml/cache/storage_manager/global/9d89b955203e49e57c85893cb6219705.training_set.csv.gz Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.9/task_repository/clearml-demo.git/realistic-example/02-model_training.py", line 26, in <module> train_target = train.loc[:, "Species"]

  
  
Posted 2 years ago

what's the clearml SDK version?

  
  
Posted 2 years ago

Hi Jake, sorry I left the office yesterday. On my laptop I have clearml==1.6.4

  
  
Posted 2 years ago

the same that is available in the agent: - clearml==1.6.4

  
  
Posted 2 years ago

AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task 🙂
This is causing the artifact .get() method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do 🙂 )

EDIT: basically clearml failed to realize you also need pandas because it was never imported ....
see list here
from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.metrics import confusion_matrix, accuracy_score import joblibFixing it would be to either add import pandas or call Task.add_requierements("pandas" Before task.init

  
  
Posted 2 years ago

Thanks Martin.. I'll add this and check whether it fixes the issue, but I don't get quite well this though.. The local code doesn't need to import pandas, because the get method returns a DataFrame object that has a .loc method.
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?

  
  
Posted 2 years ago

but I can confirm that adding the requirement with Task.add_requirements() does the trick

  
  
Posted 2 years ago

I was expecting the remote experiment to behave similarly, why do I need to import pandas there?

The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?

  
  
Posted 2 years ago

Oh I see... for some reason I thought that all the dependencies of the environment would be tracked by ClearML, but it's only the ones that actually get imported...

If locally one detects that pandas is installed and can be used to read the csv, wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?

  
  
Posted 2 years ago

And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?

  
  
Posted 2 years ago

(I mean locally now)

  
  
Posted 2 years ago

wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?

I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)

And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?

Yes, call .get_local_copy() you will always get a path to he locally downloaded artifact

  
  
Posted 2 years ago

Thanks Martin! If I end up having sometime I'll dig into the code and check if I can bake something!

  
  
Posted 2 years ago

About .get_local_copy... would that then work in the agent though?
Because I understand that there might not be a local copy in the Agent?

  
  
Posted 2 years ago

About .get_local_copy... would that then work in the agent though?

Yes it would work both locally (i.e. without agent) and remotely

Because I understand that there might not be a local copy in the Agent?

If the file does not exist locally it will be downloaded and cached for you

  
  
Posted 2 years ago

Thanks a lot :)

  
  
Posted 2 years ago