This is odd , and it is marked as failed ?
Are all the Tasks marked failed, or is it just this one ?
where can I find more info about why it failed?
Could you download and send the entire log ?
File "aicalibration/generate_tfrecord_pipeline.py", line 30, in <module>
task.upload_artifact('train_tfrecord', artifact_object=fn_train)
File "/home/usr_341317_ulta_com/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/task.py", line 1484, in upload_artifact
auto_pickle=auto_pickle, preview=preview, wait_on_upload=wait_on_upload)
File "/home/usr_341317_ulta_com/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/binding/artifacts.py", line 560, in upload_artifact
self._task.set_artifacts(self._task_artifact_list)
File "/home/usr_341317_ulta_com/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/backend_interface/task/task.py", line 1201, in set_artifacts
self._edit(execution=execution)
File "/home/usr_341317_ulta_com/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/backend_interface/task/task.py", line 1771, in _edit
raise ValueError('Task object can only be updated if created or in_progress')
ValueError: Task object can only be updated if created or in_progress
2021-03-05 22:36:13,111 - clearml.Task - INFO - Waiting to finish uploads
2021-03-05 22:36:13,578 - clearml.Task - INFO - Finished uploading
I commented the upload_artifact at the end of the code and it finishes correctly now
upload_artifact caused the "failed" issue ?
it is ok?
Task init
params setup
task.execute_remotely()
real_code here
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
well let me try excecute one of your samples
your example with absl package worked
Okay, could you try to run again with the latest clearml package from github?pip install -U git+
it would be completed right after the upload
I commented the upload_artifact at the end of the code and it finishes correctly now
Hmm... any idea on what's different with this one ?
Yes
Are you trying to upload_artifact to a Task that is already completed ?
just this one...it marks as completed when executed locally
it's my error: I have tensorflow==2.2 in my venv, and added Task.add_requirements('tensorflow')
which forces tensorflow==2.4:
Storing stdout and stderr log into [/tmp/.clearml_agent_out.kmqde7st.txt]
Traceback (most recent call last):
File "aicalibration/generate_tfrecord_pipeline.py", line 15, in <module>
task = Task.init(project_name='AI Calibration', task_name='Pipeline step 1 dataset artifact')
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/task.py", line 536, in init
TensorflowBinding.update_current_task(task)
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/binding/frameworks/tensorflow_bind.py", line 36, in update_current_task
PatchKerasModelIO.update_current_task(task)
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/binding/frameworks/tensorflow_bind.py", line 1412, in update_current_task
PatchKerasModelIO._patch_model_checkpoint()
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/binding/frameworks/tensorflow_bind.py", line 1450, in _patch_model_checkpoint
from tensorflow.python.keras.engine.network import Network # noqa
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/binding/import_bind.py", line 59, in __patched_import3
level=level)
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 83, in <module>
class Network(base_layer.Layer):
File "/home/username/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 379, in Network
@trackable_layer_utils.cache_recursive_attribute('dynamic')
AttributeError: module 'tensorflow.python.training.tracking.layer_utils' has no attribute 'cache_recursive_attribute'
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version 🙂
if I have 2.4 or 2.2 in both there is no issue
I had to downgrade tensorflow 2.4 to 2.2 though..any idea why?