Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GrievingKoala83
Moderator
12 Questions, 30 Answers
  Active since 11 June 2023
  Last activity 3 days ago

Reputation

0

Badges 1

28 × Eureka!
0 Votes
2 Answers
137 Views
0 Votes 2 Answers 137 Views
Hi everyone! I have a ClearML dataset that takes up 10 Tb. Its local download (get_local_copy) takes about a month. Can you tell me how to speed up this proc...
one month ago
0 Votes
4 Answers
45 Views
0 Votes 4 Answers 45 Views
Hello! Is there a way to launch clearml apps (for example clearml schedulers) via API or code with status tracking on ClearML application tab? If we run clea...
9 days ago
0 Votes
2 Answers
144 Views
0 Votes 2 Answers 144 Views
Hi! When running a remote task on the agent, clearml installs additional system packages. how can this be disabled? the variable agent.package_manager.system...
one month ago
0 Votes
28 Answers
138 Views
0 Votes 28 Answers 138 Views
Hi! I'm running launch_multi_mode with pytorch-lightning task.execute_remotely(queue_name='my-queue' config = task.launch_multi_node(args.nodes)) ddp = DDPSt...
one month ago
0 Votes
7 Answers
236 Views
0 Votes 7 Answers 236 Views
Hello everyone! The cache for pip does not work for agent in k8s mode. I specify agent.docker_pip_cache as /mnt/pip_cache in the clearml.conf. But nothing is...
2 months ago
0 Votes
2 Answers
19 Views
0 Votes 2 Answers 19 Views
Hello everyone! Can I create a report via API or SDK? Сan the model inference task generate a report that will be displayed in the reports tab?
4 days ago
0 Votes
1 Answers
230 Views
0 Votes 1 Answers 230 Views
3 months ago
0 Votes
1 Answers
25 Views
0 Votes 1 Answers 25 Views
Hello! Is there a way to launch clearml apps (for example clearml schedulers) via API or code with status tracking on ClearML application tab? If we run clea...
4 days ago
0 Votes
1 Answers
770 Views
0 Votes 1 Answers 770 Views
Hello everyone! How can I conveniently pass a large number of parameters to the pipeline in order to re-run it through ui?
one year ago
0 Votes
4 Answers
262 Views
0 Votes 4 Answers 262 Views
Hello! How to determine the cache for an agent in Kubernetes? I'm going to mount s3 as a cache folder as a local path using s3fs. What variable needs to be s...
2 months ago
0 Votes
1 Answers
687 Views
0 Votes 1 Answers 687 Views
Why can't I find the task created for the pipeline in the project through the main dashboard?
one year ago
0 Votes
5 Answers
731 Views
0 Votes 5 Answers 731 Views
one year ago
2 months ago
0 Hello Everyone! The Cache For Pip Does Not Work For Agent In K8S Mode. I Specify Agent.Docker_Pip_Cache As /Mnt/Pip_Cache In The Clearml.Conf. But Nothing Is Saved Along This Path
kubectl exec -it clearml-agent-85fd8ccc6d-7fdk7 -n clearml bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
root@clearml-agent-85fd8ccc6d-7fdk7:~# cat /root/clearml.conf 
agent.git_user=gitlab_agent
agent.git_pass=682S-pH9ay1nidsxBGyT
agent.cuda_version=118
#agent.docker_internal_mounts.venv_build=/home/s3_cache/venvs-builds
#agent.do...
2 months ago
0 Hi! When Running A Remote Task On The Agent, Clearml Installs Additional System Packages. How Can This Be Disabled? The Variable Agent.Package_Manager.System_Site_Packages Does Not Work

do I understand correctly that it is impossible to disable the installation of system packages without CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL?

one month ago
2 months ago
0 Hello Everyone! The Cache For Pip Does Not Work For Agent In K8S Mode. I Specify Agent.Docker_Pip_Cache As /Mnt/Pip_Cache In The Clearml.Conf. But Nothing Is Saved Along This Path

If I understand correctly, the cache for pip is stored at /root/.cache/pip. How can I change it? The agent.docker_internal_mounts.pip_cache variable in the config also does not change anything.

2 months ago
0 Hello Everyone! Can I Create A Report Via Api Or Sdk? Сan The Model Inference Task Generate A Report That Will Be Displayed In The Reports Tab?

Hi @<1523701087100473344:profile|SuccessfulKoala55> where can I get examples of REST API requests for creating reports?

3 days ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

for example, global rank from failed task in first scenario
image
image

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> hi! it works! thanks!

17 days ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> I added os.environ["NCCL_SOCKET_IFNAME" and I managed to run on nccl
But it seems that workaround that you said do not run 2 processes on 2 nodes, but 4 processes on 4 different nodes
current_conf = task.launch_multi_node(args.nodes*args.gpus)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
os.environ["NODE_RANK"] = str(current_conf["node_rank"] // args.gpus)
`os.environ["LOCAL_RANK"] = str(current_conf["nod...

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> gloo doesn't work for me either

but torch work with nccl and task.launch_multi_node

problems arise specifically with pytorch-lightning

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> it work with gpus=1 and node=2 and there are only two tasks is created

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701435869433856:profile|SmugDolphin23> ! I set NODE_RANK in the environment and now

  • if gpus=2, node=2, task.launch_multi_node(node) : three tasks are created, and two of which are completed, but one is failed. In this case, are created (gpus*nodes-1) of tasks, some of which crashes with an error, or they all fall with an error. the behavior is inconsistent.
  • if gpus=2, node=2, task.launch_multi_node(node*gpus) : seven tasks are created.I n this case, all tasks are failed except t...
one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

The errors that occur in the second case are presented in this screenshots.
image
image

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> if task.aunch_multi_node(4) , then all 4 tasks are failed
image

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701435869433856:profile|SmugDolphin23> Thank you for your reply!
I use 2 machines.
I set these parameters, but unfortunately, the training has not started.

torch.distributed.DistStoreError: Timed out after 1801 seconds waiting for clients. 2/4 clients joined.
one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> Each task shows that process allocates only 1 gpu out of 2 (all task have the same scalar as below)
image

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> yeah, I am running this inside a docker container and cuda is available

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> Two tasks were created when gpus=2, nodes=2, task.launch_multi_node(node). But their running status does not end, and model training does not begin.
image

one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23>
Logs of rank0:

Environment setup completed successfully
 
Starting Task Execution:
 
 
1718702244585 gpuvm-01:gpu3,0 DEBUG InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See: 

ClearML results page: 
 /projects/0eae440b14054464a3f9c808ad6447dd/experiments/beaa8c380f3c46f0b6f5a3feab514dc8/output/log
task id [beaa8c380f3c46f0b6f5a3feab514dc8]
world=4
...
one month ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701205467926528:profile|AgitatedDove14>
I started an experiment with gpus=2 and node=2 and I have the following logs
image
image
image

one month ago
0 Hi All! I Write A Data Processing Pipeline. It Is Necessary To Define Many Hyperparameters That Are Inconvenient To Redefine In A Pop-Up Window When Restarting The Pipeline From Ui. Is It Possible To Overrided The Parameters Through The Configuration File

Hi @<1523701205467926528:profile|AgitatedDove14>
I define a pipeline through functions. I have a lot of parameters, about 40. It is inconvenient to overwrite them all from the window that is on the screen.
image

one year ago