Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi There, I'M Having A Slight Issue With My Kubernetes Pods Silently Failing After Downloading A Clearml Registered Dataset (Which Is Around 60Gb) As Part Of A Model Training Script. The Pods Consistently Fail After Running The


Hmm yeah I have monitored some of the resource metrics and it didn't seem to be an issue. I'll attempt to install prometheus / grafana. This is a PoC however so I was hoping not to have to install too many tools.

The code running is basically this:
` if name == "main":

# initiate clear ml task
task = Task.init(
    project_name="hannd-0.1",
    task_name="train-endtoend-0.2",
    auto_connect_streams={'stdout': True, 'stderr': True, 'logging': True}
)
task.set_base_docker(docker_image="")
task.set_script(working_dir="mains/training/", entry_point="train_endtoend.py")

start_time = time.time()
args = parse_args(commands)  # Reset batch size if network is stateful

# Get a dataset
dataset = Dataset.get(dataset_id=args.clearml_dataset_id)
target_folder = dataset.get_mutable_local_copy(target_folder=args.clearml_dataset_loc, max_workers=1, overwrite=True)

if args.X is not None:
    input_files = make_file_list(target_folder, [".bin", ".mp4"]) `It seems to fail on / after the  ` shutil.copy() `  between the cache and the target folder. I've watched that folder from shelling into the pod, and the files seem to copy over fine. But something goes wrong either upon completion or during that execution which causes my pod to exit with error 137. Any thoughts at all?
  
  
Posted one year ago
111 Views
0 Answers
one year ago
one year ago