Reputation
Badges 1
533 × Eureka!Any news on this? This is kind of creepy, it's something so basic that I can't trust my prediction pipeline because sometimes it fails randomly with no reason
when I specify --packages
I shoudl manually list them all not?
Version 1.1.1
Snippet of which part exactly?
Yeah, logs saying "file not found", here is an example
2021-10-11 10:07:19 ClearML results page:
`
2021-10-11 10:07:20
Traceback (most recent call last):
File "tasks/hpo_n_best_evaluation.py", line 256, in <module>
main(args, task)
File "tasks/hpo_n_best_evaluation.py", line 164, in main
trained_models = get_models_from_task(task=hpo_task)
File "tasks/hpo_n_best_evaluation.py", line 72, in get_models_from_task
with open(pickle_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/elior/.clearml/c...
Another thing I noticed now it happens on my personal computer, when I execute the same pipeline from the exact same commit with exact same data on another host it works without these problems
I don't know, I'm the one asking the question 😄
There are many ohter packages in my environment which are not listed
btw my site packages is false - should it be true? You pasted that but I'm not sure what it should be, in the paste is false but you are asking about true
its like ps
+ grep
together 😄
let me repay you with a nice trick
Anyway I checked the base task, and this is what it has in installed packages (seems like it doesn't list all the real packages in the environment)
I prefer we debug on my machine (tell me what you want to check) than create a snippet
By the way, just inspecting, the CUDA version on the output of nvidia-smi
is matching the driver installed on the host, and not the container - look at the image below
What does that mean? How can I access this data?
nvidia/cuda:10.1-base-ubuntu18.04
But I'm naive enough to believe that 10.2 is compatible with 10.1 as it is a minor upgrade