wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
Yes, call .get_local_copy()
you will always get a path to he locally downloaded artifact
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
OK, so... when executed locally "train" prints:train: SepalLength SepalWidth PetalLength PetalWidth Species 122 7.7 2.8 6.7 2.0 2.0 86 6.7 3.1 4.7 1.5 1.0 59 5.2 2.7 3.9 1.4 1.0 4 5.0 3.6 1.4 0.2 0.0 77 6.7 3.0 5.0 1.7 1.0 .. ... ... ... ... ... 57 4.9 2.4 3.3 1.0 1.0 45 4.8 3.0 1.4 0.3 0.0 55 5.7 2.8 4.5 1.3 1.0 140 6.7 3.1 5.6 2.4 2.0 38 4.4 3.0 1.3 0.2 0.0
in a cloned experiment:train: /root/.clearml/cache/storage_manager/global/9d89b955203e49e57c85893cb6219705.training_set.csv.gz Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.9/task_repository/clearml-demo.git/realistic-example/02-model_training.py", line 26, in <module> train_target = train.loc[:, "Species"]
About .get_local_copy... would that then work in the agent though?
Because I understand that there might not be a local copy in the Agent?
AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task 🙂
This is causing the artifact .get()
method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do 🙂 )
EDIT: basically clearml failed to realize you also need pandas because it was never imported ....
see list herefrom sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.metrics import confusion_matrix, accuracy_score import joblib
Fixing it would be to either add import pandas
or call Task.add_requierements("pandas"
Before task.init
sure, give me a couple of minutes to make the changes
the same that is available in the agent: - clearml==1.6.4
Thanks Martin! If I end up having sometime I'll dig into the code and check if I can bake something!
Hi SarcasticSquirrel56 , can you print out the contents of train
and see what you get? Is that a path to the actual downloaded artifact?
Thanks Martin.. I'll add this and check whether it fixes the issue, but I don't get quite well this though.. The local code doesn't need to import pandas, because the get method returns a DataFrame object that has a .loc
method.
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
but I can confirm that adding the requirement with Task.add_requirements()
does the trick
Hi Jake, sorry I left the office yesterday. On my laptop I have clearml==1.6.4
Oh I see... for some reason I thought that all the dependencies of the environment would be tracked by ClearML, but it's only the ones that actually get imported...
If locally one detects that pandas is installed and can be used to read the csv, wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
actually there are some network issues right now, I'll share the output as soon as I manage to run it
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas
, once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?