TroubledHedgehog16
but doesn't run when I deploy it using clearml. Here's the log of the error:
...
My guess is that clearml is reimporting keras somewhere, leading to circular dependencies.
It might not be circular, but I would guess it does have something to do with order of imports. I'm trying to figure out what would be the difference between local run and using an agent
Is it the exact same TF version?
Interestingly, the example provided on clearml github works in the target agent (a docker container). It imports keras through tensorflow. Importing keras directly works on local, and in the target container. However, that fails as a clearml-task.
I tried
running it without calling Task.init without the agent - This works Without calling Task.init with the agent - doesn't work Calling Taskl.init with the agent - doesn't work
It runs directly but leads to the above error with clearml
Both manually (i.e. calling Task.init and running it without agent, and with agent ? same exact behavior ?
AgitatedDove14 It is the same version. In fact, I am using the same image from tensorflow on docker hub to run the code a) directly, and b) with clearml. It runs directly but leads to the above error with clearml.
What version of clearml
/ clearml-agent
are you using? Are you running in docker mode? Can you add your agent command here?
Also the line -File "train_tf/keras_mnist.py", line 8, in <module> import keras `` import keras
is not at line 8 in the entry script train_tf/keras_mnist.py
. I wonder why this is wrong in the logs.
My guess is that clearml is reimporting keras somewhere, leading to circular dependencies.
The tensorflow and keras version is 2.11.0 in both cases. Not noticing any mismatch.
Can you compare the installed packages between the original experiment to the cloned one? Do you see anything special or different between the two?
Is clearml importing keras or any of its modules separately? I am not able to reproduce this error outside clearml.
What about Calling Taskl.init Without the agent?
I am not using --force-current-version
so I suppose it would be pulling the latest clearml-agent version inside the container. From the logs I can see it is installing clearml-agent version 1.4.1 in the container too.
The clearml-agent version is 1.4.1 and the cleaml version is 1.8.0.
I am using the following command to run the agent:clearml-agent daemon --detached --queue US3090 USany default --docker
Just adding this here for easier readabilityClearML results page: https:/xxxxt/projects/xxx/experimentsxxx 2022-11-21 11:02:07.590338: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-21 11:02:07.733169: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0. Traceback (most recent call last): File "train_tf/keras_mnist.py", line 8, in <module> import keras File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 54, in __patched_import3 mod = builtins.__org_import__( File "/usr/local/lib/python3.8/dist-packages/keras/__init__.py", line 21, in <module> from keras import models File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 54, in __patched_import3 mod = builtins.__org_import__( File "/usr/local/lib/python3.8/dist-packages/keras/models/__init__.py", line 18, in <module> from keras.engine.functional import Functional File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 54, in __patched_import3 mod = builtins.__org_import__( File "/usr/local/lib/python3.8/dist-packages/keras/engine/functional.py", line 24, in <module> import tensorflow.compat.v2 as tf File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 62, in __patched_import3 hook() File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/frameworks/tensorflow_bind.py", line 1585, in _patch_model_checkpoint from keras.engine.sequential import Sequential # noqa File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/import_bind.py", line 54, in __patched_import3 mod = builtins.__org_import__( File "/usr/local/lib/python3.8/dist-packages/keras/engine/sequential.py", line 49, in <module> class Sequential(functional.Functional): AttributeError: partially initialized module 'keras.engine.functional' has no attribute 'Functional' (most likely due to a circular import)