Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, We Have An Agent Running Inside A Nvidia Official Container. The Agent Seems To See The Gpu Driver But The Gpu Count Is 0 When I Join That Container,

Hi,
We have an agent running inside a Nvidia official container. The agent seems to see the GPU driver but the GPU count is 0
When I join that container, nvidia-smi report the GPUs correcty. The agent is launched with clearml-agent --gpus 0
ClearML-agent v1.7.0 and ClearML v1.14.4
image

  
  
Posted one month ago
Votes Newest

Answers 6


@<1523701087100473344:profile|SuccessfulKoala55> Should I raise a github issue ?

  
  
Posted one month ago

oh ... maybe the bottleneck is augmentation in CPU !
But is it normal that the agent don't detect the GPU count and type properly ?

  
  
Posted one month ago

@<1523701087100473344:profile|SuccessfulKoala55> it is set to "all" as :

NV_LIBCUBLAS_VERSION=12.2.5.6-1NVIDIA_VISIBLE_DEVICES=allCLRML_API_SERVER_URL=https://<redacted>HOSTNAME=1b6a5b546a6bNVIDIA_REQUIRE_CUDA=cuda>=12.2 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526NV_NVTX_VERSION=12.2.140-1NV_LIBCUSPARSE_VERSION=12.1.2.141-1NV_LIBNPP_VERSION=12.2.1.4-1NCCL_VERSION=2.19.3-1PWD=/CLRML_FILE_SERVER_URL=<redacted>/clearmlCLRML_SECRET_KEY=<redacted>NVIDIA_DRIVER_CAPABILITIES=compute,utilityNV_LIBNPP_PACKAGE=libnpp-12-2=12.2.1.4-1NVIDIA_PRODUCT_NAME=CUDACLRML_ACCESS_KEY=TZQ8P5RNJ6IDLIZ5M3C0NV_CUDA_CUDART_VERSION=12.2.140-1HOME=/rootCLRML_CONTAINER_NAME=clearmlCUDA_VERSION=12.2.2NV_LIBCUBLAS_PACKAGE=libcublas-12-2=12.2.5.6-1CLRML_WEB_SERVER_URL=<redacted>NV_LIBCUBLAS_PACKAGE_NAME=libcublas-12-2CLRML_GIT_TOKEN=TERM=xtermCLRML_DOCKER_IMAGE=<redacted>/agent-image:v6SHLVL=1NV_CUDA_LIB_VERSION=12.2.2-1NVARCH=x86_64CLRML_ENV=prdCLRML_STORAGE_ACCOUNT=<redacted>CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/usr/bin/python3.10NV_CUDA_COMPAT_PACKAGE=cuda-compat-12-2NV_LIBNCCL_PACKAGE=libnccl2=2.19.3-1+cuda12.2LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64CLRML_GIT_USER=CLEARML_WORKER_NAME=tff-AIOT-Q470EA-IM-A:<redacted>/agent-image:v6PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binNV_LIBNCCL_PACKAGE_NAME=libnccl2CLRML_STORAGE_KEY=<redacted>NV_LIBNCCL_PACKAGE_VERSION=2.19.3-1OLDPWD=/tmp/tmp.A3X3CWjlZc_=/usr/local/bin/clearml-agentroot@1b6a5b546a6b:/proc/68# 
  
  
Posted one month ago

Hi @<1576381444509405184:profile|ManiacalLizard2> , can you check what is the environment variable value for NVIDIA_VISIBLE_DEVICES in the agent's process? You can check /proc/<agent-pid>/environ and see

  
  
Posted one month ago

Hi @<1576381444509405184:profile|ManiacalLizard2> , sorry for the late response - please do 🙏

  
  
Posted 29 days ago

the weird thing is that: the GPU 0 seems to be in used as reported by nvtop in the host. But it is 50% slower than when running directly instead of through the clearml-agent ...

  
  
Posted one month ago
159 Views
6 Answers
one month ago
28 days ago
Tags