Hi Folks, One Question: I Have A Script That Looks Like:

Answered

Hi folks, one question:

I have a script that looks like:

` import clearml as cml
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix, accuracy_score
import joblib

task = cml.Task.init(
project_name="Iris",
task_name="Report",
task_type=cml.TaskTypes.inference
)

dataset_task = cml.Task.get_task(
project_name='Iris',
task_name='Dataset_Generation',
task_filter={"status": ["completed"]}
)

train = dataset_task.artifacts["training_set"].get()
test = dataset_task.artifacts["test_set"].get()
train_target = train.loc[:, "Species"]
train = train.drop(columns=["Species"])
test_target = test.loc[:,"Species"]
test = test.drop(columns=["Species"])

...

rest of the code `
When I execute it locally, everything works, however when I clone it and execute it remotely changing some parameter, the experiment fails with this error:

2022-09-05 15:28:53,427 - clearml.util - WARNING - Selected taskDataset_Generation(id=f6ebdbf830474cb3af0c040ce412c316) Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.9/task_repository/clearml-demo.git/realistic-example/02-model_training.py", line 25, in <module> train_target = train.loc[:, "Species"] AttributeError: 'PosixPath' object has no attribute 'loc' 2022-09-05 17:29:09 Process failed, exit code 1It looks like the "get()" method doesn't retrieve the artifact as I would have expected, rather it returns a path to it....

Is this expected, and what would be the best way to retrieve the artifact from the fileserver then?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Votes Newest

Answers 19

OK, so... when executed locally "train" prints:
train: SepalLength SepalWidth PetalLength PetalWidth Species 122 7.7 2.8 6.7 2.0 2.0 86 6.7 3.1 4.7 1.5 1.0 59 5.2 2.7 3.9 1.4 1.0 4 5.0 3.6 1.4 0.2 0.0 77 6.7 3.0 5.0 1.7 1.0 .. ... ... ... ... ... 57 4.9 2.4 3.3 1.0 1.0 45 4.8 3.0 1.4 0.3 0.0 55 5.7 2.8 4.5 1.3 1.0 140 6.7 3.1 5.6 2.4 2.0 38 4.4 3.0 1.3 0.2 0.0in a cloned experiment:
train: /root/.clearml/cache/storage_manager/global/9d89b955203e49e57c85893cb6219705.training_set.csv.gz Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.9/task_repository/clearml-demo.git/realistic-example/02-model_training.py", line 26, in <module> train_target = train.loc[:, "Species"]

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Thanks Martin.. I'll add this and check whether it fixes the issue, but I don't get quite well this though.. The local code doesn't need to import pandas, because the get method returns a DataFrame object that has a .loc method.
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task 🙂
This is causing the artifact .get() method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do 🙂 )

EDIT: basically clearml failed to realize you also need pandas because it was never imported ....
see list here
from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.metrics import confusion_matrix, accuracy_score import joblibFixing it would be to either add import pandas or call Task.add_requierements("pandas" Before task.init

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi Jake, sorry I left the office yesterday. On my laptop I have clearml==1.6.4

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

sure, give me a couple of minutes to make the changes

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

but I can confirm that adding the requirement with Task.add_requirements() does the trick

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Oh I see... for some reason I thought that all the dependencies of the environment would be tracked by ClearML, but it's only the ones that actually get imported...

If locally one detects that pandas is installed and can be used to read the csv, wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Hi SarcasticSquirrel56 , can you print out the contents of train and see what you get? Is that a path to the actual downloaded artifact?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

About .get_local_copy... would that then work in the agent though?

Yes it would work both locally (i.e. without agent) and remotely

Because I understand that there might not be a local copy in the Agent?

If the file does not exist locally it will be downloaded and cached for you

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?

I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)

And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?

Yes, call .get_local_copy() you will always get a path to he locally downloaded artifact

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks Martin! If I end up having sometime I'll dig into the code and check if I can bake something!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

(I mean locally now)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

Thanks a lot :)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

About .get_local_copy... would that then work in the agent though?
Because I understand that there might not be a local copy in the Agent?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

I was expecting the remote experiment to behave similarly, why do I need to import pandas there?

The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

actually there are some network issues right now, I'll share the output as soon as I manage to run it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

the same that is available in the agent: - clearml==1.6.4

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SarcasticSquirrel56
				
					0
					 × 1

what's the clearml SDK version?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Write your answer

2K Views

19 Answers

3 years ago

2 years ago