As I suspected, from your log:agent.package_manager.system_site_packages = false
Which is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there is no need to pass full conf file)
DeliciousBluewhale87 this is exactly how it works,
The glue puts a k8s job with the requested docker image (the one on the Task), the job itself (k8s job) starts the agent inside the requested docker, then the agent inside the docker will install all the required packages.
DeliciousBluewhale87 could you send the new log?
DeliciousBluewhale87 could you send the full log of the Task?
Hi AgitatedDove14 , Just updated that flag, but the problem continues..
` agent.package_manager.system_site_packages = true
.....
Environment setup completed successfully
Starting Task Execution:
ClearML results page: files_server:
Traceback (most recent call last):
File "base_template_keras_simple.py", line 15, in <module>
import tensorflow as tf # noqa: F401
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/binding/import_bind.py", line 59, in __patched_import3
level=level)
ModuleNotFoundError: No module named 'tensorflow' `
Just figured out..
Seems like the docker image below, didnt have tensorflow package.. 😮tensorflow/tensorflow:latest-devel-gpu
I shld have checked prior... My Bad..
Thanks for the help
Assuming from previous threads this is run on K8s , I think a configuration is missing, use system packages:
https://github.com/allegroai/clearml-agent/blob/cb6bdece39751eaef975287609b8bab603f116e5/docs/clearml.conf#L57
AgitatedDove14 the k8s glue always sets this value to true
, see here: https://github.com/allegroai/clearml-agent/blob/cb6bdece39751eaef975287609b8bab603f116e5/clearml_agent/glue/k8s.py#L130
Essentially, while running on k8s_glue, I want to pull the docker image/container, then pip install the additional requirements.txt into them...
DeliciousBluewhale87 great we have progress, this look slike it is inheriting from the system packages:
For example you can see in the log,Requirement already satisfied: future>=0.16.0 in /usr/local/lib/python3.6/dist-packages
Now the question is which docker it is running, because as you can see at the bottom of the log, tensorflow is not listed as installed, but other packages installed inside the docker are listed.
wdyt?
AgitatedDove14 Full Log as requested.