AgitatedDove14 Looks like that. First, I've created a toy task running in "services" queue (you didn't tell that but I guess you assumed). I haven't found how to specify the queue to run in code ( Task.equeue(task, queue_name='services')
returned an error), so I ran toy.py first in "default" queue, aborted toy.py, started nntraining in "default" queue. Then I reset toy.py and enqueued it to "services" queue. Toy.py failed shortly. I've also reset both toy.py and nntraining and enqueued first toy.py (in "services" que) and then - nntraining (in "default" queue). In this case, nntraining failed. In both failed cases error is the same:Traceback (most recent call last): File "c:\users\super\anaconda3\envs\tf22\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\super\anaconda3\envs\tf22\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 2633, in <module> main() File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 869, in main symlink=options.symlink, File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 1161, in create_environment install_python(home_dir, lib_dir, inc_dir, bin_dir, site_packages=site_packages, clear=clear, symlink=symlink) File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 1531, in install_python shutil.copyfile(executable, py_executable) File "c:\users\super\anaconda3\envs\tf22\lib\shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Super\\.clearml\\venvs-builds\\3.7\\Scripts\\python.exe' Using base prefix 'c:\\users\\super\\anaconda3\\envs\\tf22' No LICENSE.txt / LICENSE found in source New python executable in C:\Users\Super\.clearml\venvs-builds\3.7\Scripts\python.exe clearml_agent: ERROR: Command '['python', '-m', 'virtualenv', 'C:\\Users\\Super\\.clearml\\venvs-builds\\3.7']' returned non-zero exit status 1.
Hence, the process, which runs first blocks the process, which runs second in another queue. The type of queue - either "default" or "services" doesn't play any role.
Hi everyone. I have an issue with the simple pipeline - it runs two similar nn training steps (tf2.3, windows10, python 3.7) with only difference is a batch size. I'm running first separately each step to have them in ClearML project page. Then I run pipeline controller, which makes a clone of each step and runs smoothly. If I run pipeline from command string again, it works Ok. However, if I clone and enqueue the pipeline, it starts, creates the clone of the fist step pending and then nothing happens. First step remains pending and doesn't start. Can anyone help with the issue? Here's the pipeline controller code:
` from clearml import Task
from clearml.automation.controller import PipelineController
Connecting ClearML with the current process,
from here on everything is logged automatically
task = Task.init(project_name='Tom', task_name='test pipeline',
task_type=Task.TaskTypes.controller, reuse_last_task_id=False)
pipe = PipelineController(default_execution_queue='default', add_pipeline_tags=False)
pipe.add_step(name='train_1st_nn_copy', base_task_project='Tom', base_task_name='train_1st_nn', parameter_override={'batch_size': 8})
pipe.add_step(name='train_2nd_nn_copy', parents=['train_1st_nn_copy', ],
base_task_project='Tom', base_task_name='train_2nd_nn',
parameter_override={'batch_size': 4})
Starting the pipeline (in the background)
pipe.start()
Wait until pipeline terminates
pipe.wait()
cleanup everything
pipe.stop()
print('done') `If I abort pipeline controller task, pending "train_1st_nn" task executes ok.