Hi ThankfulHedgehong21 ,
What versions of ClearML & ClearML-Agent are you using?
Also, can you provide a small code snippet to play with?
clearml version 1.0.5, Server 1.1.0
code for reproduce
` import multiprocessing
from machine_learning.clearml_client import Task
def init_clearml_task(patch_set_name, model_name, is_ensemble):
task_name = f'{patch_set_name} {model_name}'
task = Task.init(
project_name=f"bla CV",
task_name=task_name,
tags=[model_name, patch_set_name],
reuse_last_task_id=False
)
task.connect({"bla": "bla"}, 'IbexConfig')
return task
def execute_1():
print("proc1")
task = init_clearml_task("alg1", "train1_debug_cml", is_ensemble=False)
task.close()
print("done_proc2")
def execute_2():
print("proc2")
task = init_clearml_task("alg2", "train2_debug_cml", is_ensemble=False)
task.close()
print("done_proc2")
proc = multiprocessing.Process(target=execute_1)
proc.start()
proc.join(35000)
print("father_script_done_proc1")
task = init_clearml_task('summary', 'alg1_debug_cml', is_ensemble=False)
task.close()
proc2 = multiprocessing.Process(target=execute_2)
proc2.start()
proc2.join(35000)
print("done???????????????") logs
-u /home/tomer/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 47969 --file /home/tomer/ibex-ai-train/patch_level_classification/reproduce_cml_error.py
Connected to pydev debugger (build 221.5921.27)
/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/jwt/utils.py:7: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
proc1
ClearML Task: created new task id=56993ab96cb64b3089011b6f4d2c7e58
ClearML results page:
/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/x509.py:17: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
utils.DeprecatedIn35,
2022-07-21 06:42:08,534 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
2022-07-21 06:42:45,338 - clearml.Task - INFO - Finished repository detection and package analysis
done_proc2
father_script_done_proc1
ClearML Task: created new task id=50bff5797e664d699d6a58b57d54cdb4
ClearML results page:
/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/x509.py:17: CryptographyDeprecationWarning: This version of cryptography contains a temporary pyOpenSSL fallback path. Upgrade pyOpenSSL now.
utils.DeprecatedIn35,
2022-07-21 06:42:49,662 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
2022-07-21 06:43:27,494 - clearml.Task - INFO - Finished repository detection and package analysis
proc2
2022-07-21 06:43:29,955 - clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###
Process Process-3:
Traceback (most recent call last):
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/tomer/ibex-ai-train/patch_level_classification/reproduce_cml_error.py", line 24, in execute_2
task = init_clearml_task("alg2", "train2_debug_cml", is_ensemble=False)
File "/home/tomer/ibex-ai-train/patch_level_classification/reproduce_cml_error.py", line 13, in init_clearml_task
task.connect({"bla": "bla"}, 'IbexConfig')
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/task.py", line 1119, in connect
return method(mutable, name=name)
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/task.py", line 2747, in _connect_dictionary
self._arguments.copy_from_dict(flatten_dictionary(dictionary), prefix=name)
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/backend_interface/task/args.py", line 446, in copy_from_dict
__parameters_types=param_types,
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/backend_interface/task/task.py", line 1126, in update_parameters
self._set_parameters(*args, __update=True, **kwargs)
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/backend_interface/task/task.py", line 1048, in _set_parameters
self._edit(hyperparams=hyperparams)
File "/home/tomer/miniconda3/envs/ibex/lib/python3.6/site-packages/clearml/backend_interface/task/task.py", line 1853, in _edit
raise ValueError('Task object can only be updated if created or in_progress')
ValueError: Task object can only be updated if created or in_progress
done???????????????
Process finished with exit code 0 there is this line:
WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ### `and after this the connect is failing (because the task never open correctly)
from some reason it happend in the example I gave when running in debug only, maybe matter of timing, but it happend in my "real" script also not in debugging
I updated the versions to clearml 1.6.2 Server 1.5.0, it still happening , when callinginit_clearml_task('summary', 'alg1_debug_cml', is_ensemble=False)
clearml doesnt create a new task, but now the process doesn't crush
ThankfulHedgehong21 , server 1.6.0 is available. Can you try with it as well?
I cant because we have some experiments running (I didnt update before, just used another newer server)
I'll try and see if it reproduces on my side, thanks! 🙂
The sample script you posted runs fine on server 1.6.0. I did however comment out from machine_learning.clearml_client import Task
and used from clearml import Task
Can you please try with the regular import?
yes, it was left by mistake (it calls)from clearml import Task
doesnt change the behavior
try this one (even when running without debug)
` import multiprocessing
import time
from clearml import Task
def init_clearml_task(patch_set_name, model_name, is_ensemble):
task_name = f'{patch_set_name} {model_name}'
task = Task.init(
project_name=f"bla CV",
task_name=task_name,
tags=[model_name, patch_set_name],
reuse_last_task_id=False
)
task.connect({"bla": "bla"}, 'IbexConfig')
return task
def execute_1():
print("proc1")
task = init_clearml_task("alg1", "train1_debug_cml", is_ensemble=False)
time.sleep(5)
task.close()
print("done_proc2")
def execute_2():
print("proc2")
task = init_clearml_task("alg2", "train2_debug_cml", is_ensemble=False)
time.sleep(5)
task.close()
print("done_proc2")
proc = multiprocessing.Process(target=execute_1)
proc.start()
proc.join(35000)
time.sleep(5)
print("father_script_done_proc1")
task = init_clearml_task('summary', 'alg1_debug_cml', is_ensemble=False)
time.sleep(5)
task.close()
time.sleep(5)
proc2 = multiprocessing.Process(target=execute_2)
proc2.start()
time.sleep(5)
proc2.join(35000)
time.sleep(5)
print("done???????????????") `
Try spinning a 1.6.0 server to see if it will work there. BTW what python version are you using?
do you say when running on 1.6.0 you see 3 tasks? (where I see 2)
I will update to 1.6 after the weekend and check