Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:


Ok so my train.py now looks like this:

print("Before import")

from pathlib import Path

import hydra
import lightning as L
import torch
from coolname import generate_slug
from omegaconf import DictConfig

from src.datasets import JobDataModule
from src.models import JobModel
from src.utils import LogSummaryCallback, get_num_steps, prepare_loggers_and_callbacks

from clearml import Task

for i in range(torch.cuda.device_count()):
    print(torch.cuda.get_device_properties(i).name)

print("Before task")

task = Task.init(project_name="ClearML Testing", task_name="FMNIST")
task.set_repo(
    repo="git@ssh.dev.azure.com:v3/mclarenracing/Application%20Engineering/ml-queue-test"
)
task.set_packages("requirements.txt")

print("After task")

And the log looks like this:

Starting Task Execution:
Before import
2024-07-19 09:06:09
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
NVIDIA H100 80GB HBM3
Before task

So it looks like it's getting stuck at Task.init

  
  
Posted 2 months ago
30 Views
0 Answers
2 months ago
2 months ago