Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GrievingKoala83
Moderator
22 Questions, 45 Answers
  Active since 11 June 2023
  Last activity 5 months ago

Reputation

0

Badges 1

43 × Eureka!
0 Votes
2 Answers
858 Views
0 Votes 2 Answers 858 Views
11 months ago
0 Votes
1 Answers
477 Views
0 Votes 1 Answers 477 Views
Hello! As fas as I understand, files are being sorted by its last modification time during cache cleaning? So that files that were downloaded long time ago b...
5 months ago
0 Votes
1 Answers
379 Views
0 Votes 1 Answers 379 Views
Is it possible to use the Resource Configuration ( None ) in the clearml free tier?
5 months ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
one year ago
0 Votes
28 Answers
1K Views
0 Votes 28 Answers 1K Views
Hi! I'm running launch_multi_mode with pytorch-lightning task.execute_remotely(queue_name='my-queue' config = task.launch_multi_node(args.nodes)) ddp = DDPSt...
one year ago
0 Votes
1 Answers
589 Views
0 Votes 1 Answers 589 Views
Hello everyone! I need to run the pipeline on schedule. Are there any restrictions on running pipelines through the scheduler? Would hidden project for pipe ...
7 months ago
0 Votes
3 Answers
623 Views
0 Votes 3 Answers 623 Views
Hello everyone! I tried to remove models from ClearML using clearml.Model.remove( model=model_id, delete_weights_file=True, force=True, raise_on_errors=True,...
7 months ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hello everyone! The cache for pip does not work for agent in k8s mode. I specify agent.docker_pip_cache as /mnt/pip_cache in the clearml.conf. But nothing is...
one year ago
0 Votes
5 Answers
2K Views
0 Votes 5 Answers 2K Views
2 years ago
0 Votes
5 Answers
851 Views
0 Votes 5 Answers 851 Views
Hi everyone! I'm trying to use task.launch_multi_node(nodes, devices=gpus, hide_children=True) in conjunction with pytorch-ligtning. I am using the latest ve...
9 months ago
0 Votes
5 Answers
643 Views
0 Votes 5 Answers 643 Views
Hello! Can you help me with Model Endpoints tab - how to connect it to existing clearml-serving instance?
6 months ago
0 Votes
10 Answers
619 Views
0 Votes 10 Answers 619 Views
Hello! I need to run clearml pipeline with caching of steps. I specify cache_executed_step=True for each step, but my steps are not cached and ended with the...
6 months ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
Hello everyone! How can I conveniently pass a large number of parameters to the pipeline in order to re-run it through ui?
2 years ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hi! When running a remote task on the agent, clearml installs additional system packages. how can this be disabled? the variable agent.package_manager.system...
one year ago
0 Votes
1 Answers
2K Views
0 Votes 1 Answers 2K Views
Why can't I find the task created for the pipeline in the project through the main dashboard?
2 years ago
0 Votes
0 Answers
399 Views
0 Votes 0 Answers 399 Views
Hello! Why in get_local_copy method ( None ) dependencies_by_order variable include only 1 parent dataset while there are 3 (pic. 2)? If I understand it corr...
5 months ago
0 Votes
4 Answers
2K Views
0 Votes 4 Answers 2K Views
Hello! How to determine the cache for an agent in Kubernetes? I'm going to mount s3 as a cache folder as a local path using s3fs. What variable needs to be s...
one year ago
0 Votes
2 Answers
728 Views
0 Votes 2 Answers 728 Views
hello everyone! is it possible to transfer data (datasets, models) from one ClearML instance to another? How can I do this?
11 months ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Hello! Is there a way to launch clearml apps (for example clearml schedulers) via API or code with status tracking on ClearML application tab? If we run clea...
one year ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
Hello everyone! Can I create a report via API or SDK? Сan the model inference task generate a report that will be displayed in the reports tab?
one year ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hello! Is there a way to launch clearml apps (for example clearml schedulers) via API or code with status tracking on ClearML application tab? If we run clea...
one year ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
Hi everyone! I have a ClearML dataset that takes up 10 Tb. Its local download (get_local_copy) takes about a month. Can you tell me how to speed up this proc...
one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> I added os.environ["NCCL_SOCKET_IFNAME" and I managed to run on nccl
But it seems that workaround that you said do not run 2 processes on 2 nodes, but 4 processes on 4 different nodes
current_conf = task.launch_multi_node(args.nodes*args.gpus)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
os.environ["NODE_RANK"] = str(current_conf["node_rank"] // args.gpus)
`os.environ["LOCAL_RANK"] = str(current_conf["nod...

one year ago
0 Hi Everyone! I'M Trying To Use

@<1523701435869433856:profile|SmugDolphin23> Everything worked after setting the variables: --env NCCL_IB_DISABLE=1 --env NCCL_SOCKET_IFNAME=ens192 --env NCCL_P2P_DISABLE=1. But previously, these variables were not required for a successful launch. When I run ddp training with two nodes , everything works for me now. But as soon as I increase their number ( nodes > 2 ), I get the following error.

Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.11/code/light...
9 months ago
0 Hi Everyone! I'M Trying To Use

@<1523701435869433856:profile|SmugDolphin23> This error occurs when a secondary task is created with launch_multi_node. And this error disappears when I add the reuse_last_task_id=False flag when initializing the task. But now I have a new problem. I can't request more than 2 nodes. The training logs freezes after several iterations of first epoch with three workers. And if i request four workers i get this error:

DEBUG Epoch 0:   8%|▊         | 200/2484 [04:43<53:55,  0.71it/s, v_num=...
9 months ago
0 Hello Everyone! Can I Create A Report Via Api Or Sdk? Сan The Model Inference Task Generate A Report That Will Be Displayed In The Reports Tab?

Hi @<1523701087100473344:profile|SuccessfulKoala55> where can I get examples of REST API requests for creating reports?

one year ago
0 Hi Everyone! I'M Trying To Use

@<1523701435869433856:profile|SmugDolphin23> It is possible to request up to 5 workers in the toy example with Feed Forward and MNIST, BUT it is not possible to request more than 2 workers on a real large model

9 months ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701205467926528:profile|AgitatedDove14>
I started an experiment with gpus=2 and node=2 and I have the following logs
image
image
image

one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> Two tasks were created when gpus=2, nodes=2, task.launch_multi_node(node). But their running status does not end, and model training does not begin.
image

one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701435869433856:profile|SmugDolphin23> ! I set NODE_RANK in the environment and now

  • if gpus=2, node=2, task.launch_multi_node(node) : three tasks are created, and two of which are completed, but one is failed. In this case, are created (gpus*nodes-1) of tasks, some of which crashes with an error, or they all fall with an error. the behavior is inconsistent.
  • if gpus=2, node=2, task.launch_multi_node(node*gpus) : seven tasks are created.I n this case, all tasks are failed except t...
one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701435869433856:profile|SmugDolphin23> Thank you for your reply!
I use 2 machines.
I set these parameters, but unfortunately, the training has not started.

torch.distributed.DistStoreError: Timed out after 1801 seconds waiting for clients. 2/4 clients joined.
one year ago
0 Hello! I Need To Run Clearml Pipeline With Caching Of Steps. I Specify Cache_Executed_Step=True For Each Step, But My Steps Are Not Cached And Ended With The Status - Completed.

@<1523701070390366208:profile|CostlyOstrich36> If I run the pipeline with the same input parameters, all the steps will also be re-run, nothing will be taken from the cache

6 months ago
0 Hello! Can You Help Me With Model Endpoints Tab - How To Connect It To Existing Clearml-Serving Instance?

@<1523701070390366208:profile|CostlyOstrich36> i have 2 clearm-serving instances with endpoints

6 months ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> yeah, I am running this inside a docker container and cuda is available

one year ago
0 Hello! I Need To Run Clearml Pipeline With Caching Of Steps. I Specify Cache_Executed_Step=True For Each Step, But My Steps Are Not Cached And Ended With The Status - Completed.

I create a pipeline via PipelineController with adding a step as a function

pipe = PipelineController(
        name=cfg.clearml.pipeline_name,
        project=cfg.clearml.project_name,
        target_project=True,
        version=cfg.clearml.version,
        add_pipeline_tags=True,
        docker=cfg.clearml.dockerfile,
        docker_args=DefaultMLPLATparam().docker_arg,
        packages=packages,
        retry_on_failure=3
    )

for parameter in cfg.clearml.params:
        pipe.add_...
6 months ago
0 Hello! I Need To Run Clearml Pipeline With Caching Of Steps. I Specify Cache_Executed_Step=True For Each Step, But My Steps Are Not Cached And Ended With The Status - Completed.

@<1523701070390366208:profile|CostlyOstrich36> Above, I provided the code for this pipeline, I specify cache_executed_step=True for each pipeline step , but it doesn't work.

6 months ago
6 months ago
0 Hello Everyone! Is It Possible To Transfer Data (Datasets, Models) From One Clearml Instance To Another? How Can I Do This?

I store my data in s3 and clearml tracks this data. I want to migrate this data from one ClearML instance to another, that is, transfer it to another s3 and have a new ClearML instance track it

11 months ago
0 Hi All! I Write A Data Processing Pipeline. It Is Necessary To Define Many Hyperparameters That Are Inconvenient To Redefine In A Pop-Up Window When Restarting The Pipeline From Ui. Is It Possible To Overrided The Parameters Through The Configuration File

Hi @<1523701205467926528:profile|AgitatedDove14>
I define a pipeline through functions. I have a lot of parameters, about 40. It is inconvenient to overwrite them all from the window that is on the screen.
image

2 years ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> it work with gpus=1 and node=2 and there are only two tasks is created

one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> gloo doesn't work for me either

but torch work with nccl and task.launch_multi_node

problems arise specifically with pytorch-lightning

one year ago
one year ago
0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1523701435869433856:profile|SmugDolphin23> Each task shows that process allocates only 1 gpu out of 2 (all task have the same scalar as below)
image

one year ago
Show more results compactanswers