Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

Hi everyone. I have an issue with the simple pipeline - it runs two similar nn training steps (tf2.3, windows10, python 3.7) with only difference is a batch size. I'm running first separately each step to have them in ClearML project page. Then I run pipeline controller, which makes a clone of each step and runs smoothly. If I run pipeline from command string again, it works Ok. However, if I clone and enqueue the pipeline, it starts, creates the clone of the fist step pending and then nothing happens. First step remains pending and doesn't start. Can anyone help with the issue? Here's the pipeline controller code:
` from clearml import Task
from clearml.automation.controller import PipelineController

Connecting ClearML with the current process,

from here on everything is logged automatically

task = Task.init(project_name='Tom', task_name='test pipeline',
task_type=Task.TaskTypes.controller, reuse_last_task_id=False)

pipe = PipelineController(default_execution_queue='default', add_pipeline_tags=False)
pipe.add_step(name='train_1st_nn_copy', base_task_project='Tom', base_task_name='train_1st_nn', parameter_override={'batch_size': 8})
pipe.add_step(name='train_2nd_nn_copy', parents=['train_1st_nn_copy', ],
base_task_project='Tom', base_task_name='train_2nd_nn',
parameter_override={'batch_size': 4})

Starting the pipeline (in the background)

pipe.start()

Wait until pipeline terminates

pipe.wait()

cleanup everything

pipe.stop()

print('done') `If I abort pipeline controller task, pending "train_1st_nn" task executes ok.

  
  
Posted 3 years ago
Votes Newest

Answers 31


AgitatedDove14 Looks like that. First, I've created a toy task running in "services" queue (you didn't tell that but I guess you assumed). I haven't found how to specify the queue to run in code ( Task.equeue(task, queue_name='services') returned an error), so I ran toy.py first in "default" queue, aborted toy.py, started nntraining in "default" queue. Then I reset toy.py and enqueued it to "services" queue. Toy.py failed shortly. I've also reset both toy.py and nntraining and enqueued first toy.py (in "services" que) and then - nntraining (in "default" queue). In this case, nntraining failed. In both failed cases error is the same:
Traceback (most recent call last): File "c:\users\super\anaconda3\envs\tf22\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\super\anaconda3\envs\tf22\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 2633, in <module> main() File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 869, in main symlink=options.symlink, File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 1161, in create_environment install_python(home_dir, lib_dir, inc_dir, bin_dir, site_packages=site_packages, clear=clear, symlink=symlink) File "c:\users\super\anaconda3\envs\tf22\lib\site-packages\virtualenv.py", line 1531, in install_python shutil.copyfile(executable, py_executable) File "c:\users\super\anaconda3\envs\tf22\lib\shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Super\\.clearml\\venvs-builds\\3.7\\Scripts\\python.exe' Using base prefix 'c:\\users\\super\\anaconda3\\envs\\tf22' No LICENSE.txt / LICENSE found in source New python executable in C:\Users\Super\.clearml\venvs-builds\3.7\Scripts\python.exe clearml_agent: ERROR: Command '['python', '-m', 'virtualenv', 'C:\\Users\\Super\\.clearml\\venvs-builds\\3.7']' returned non-zero exit status 1.Hence, the process, which runs first blocks the process, which runs second in another queue. The type of queue - either "default" or "services" doesn't play any role.

  
  
Posted 3 years ago