SmugDolphin23

0 Questions, 418 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Answers 418

ShinyPuppy47 does add_task_init_call help your case? https://clear.ml/docs/latest/docs/references/sdk/task/#taskcreate

2 years ago

0 Hi, I Am Trying To Upload A Model Using Pipelinecontroller But I Get The Following Error. Clearml==1.8.3 Can Anyone Help Here?

Don't call PipelineController functions after start has finished. Use a post_execute_callback instead
` from clearml import PipelineController

def some_step():
return

def upload_model_to_controller(controller, node):
print("Start uploading the model")

if name == "main":
pipe = PipelineController(name="Yolo Pipeline Controller", project="yolo_pipelines", version="1.0.0")

pipe.add_function_step(
    name="some_step",
    function=some_st...

2 years ago

0 Hi, I Am Trying To Upload A Model Using Pipelinecontroller But I Get The Following Error. Clearml==1.8.3 Can Anyone Help Here?

That is very odd. Is the script above all you're running?

2 years ago

0 For Some Reason I Can'T Delete A Pipeline Projet, The Deletion Is Running Indefinitely. Is There A Way To Force The Deletion Of A Project Via The Apiclient?

Try examples/.pipelines/custom pipeline logic instead of pipeline_project/.pipelines/custom pipeline logic

2 years ago

0 Dear Community, I Have Tried To Use

Hi @<1668427963986612224:profile|GracefulCoral77> ! The error is a bit misleading. What it actually means is that you shouldn't attempt to modify a finalized clearml dataset (I suppose that is what you are trying to achieve). Instead, you should create a new dataset that inherits from the finalized one and sync that dataset, or leave the dataset in an unfinalized state

11 months ago

0 Is There Any Way To Get Dataset Size Without Downloading State.Json? Im Doing Ds = Clearml.Dataset.Get(Dataset_Id=D_Id), But It Instantly Tries To Download State.Json Which Is On S3. Im Only Interested In Size And File Count Which I Then Get From Calling

Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! You could get the Dataset Struct configuration object and get the job_size from there, which is the dataset size in bytes. The task IDs of the datasets are the same as the datasets' IDs by the way, so you can call all the clearml task related function on the task your get by doing Task.get_task("dataset_id")

7 months ago

0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1578555761724755968:profile|GrievingKoala83> ! It looks like lightning uses the NODE_RANK env var to get the rank of a node, instead of NODE (which is used by pytorch).
We don't set NODE_RANK yet, but you could set it yourself after launchi_multi_node :

import os    
current_conf = task.launch_multi_node(2)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))

Hope this helps

6 months ago

0 I Am Using Clearml Pro And Pretty Regularly I Will Restart An Experiment And Nothing Will Get Logged To Clearml. It Shows The Experiment Running (For Days) And It'S Running Fine On The Pc But No Scalers Or Debug Samples Are Shown. How Do We Troubleshoot T

Any chance you have some uncommited code changes that, when not included, this works fine?

5 months ago

0 Reporting Nonetype Scalars.

Hi @<1631102016807768064:profile|ZanySealion18> ! Reporting None is not possible, but you could report np.nan instead.

4 months ago

0 Hi, I’M Trying To Integrate Logger In My Pipelinedecorator But I’M Getting This Error -

Your object is likely holding some file descriptor or something like that. The pipeline steps are all running in separate processes (they can even run on different machines while running remotely). You need to make sure that the objects that you are returning are thus pickleable and can be passed between these processes. You can try to see that the logger you are passing around is indeed pickalable by calling pickle.dump(s) on it an then loading it in another run.
The best practice would ...

10 months ago

0 Hello, Im Having Huge Performance Issues On Large Clearml Datasets How Can I Link To Parent Dataset Without Parent Dataset Files. I Want To Create A Smaller Subset Of Parent Dataset, Like 5% Of It. To Achieve This, I Have To Call Remove_Files() To 60K It

Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! Looks like remove_files doesn't support lists indeed. It does support paths with wildcards tho, if that helps.
I would remove all the files to the dataset and add only the ones you need back as a workaround for now, or just create a new dataset

7 months ago

0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

because I think that what you are encountering now is an NCCL error

6 months ago

0 Hi! To Make My Script Work Inside A Task, I Need To Add

Hi @<1714451218161471488:profile|ClumsyChimpanzee54> ! We will automatically add the cwd of the pipeline controller to the python path when running locally in a future version.
If running remotely, you can approach this in a few ways:

add the whole project to a git repo and specify that repo in the pipeline steps
have a prebuilt docker image that contains your project's code. you may then set the working directory to the path of your project
if the agent running the docker is running ...

5 months ago

0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1578555761724755968:profile|GrievingKoala83> ! Are you trying to launch 2 nodes each using 2 gpus on only 1 machine? Because I think that will likely not work because of nccl limitation
Also, I think that you should actually do

task.launch_multi_node(nodes)
os.environ["LOCAL_RANK"] = 0  # this process should fork the other one
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
os.environ["GLOBAL_RANK"] = str(current_conf.get("node_rank", "")) * gpus
os.environ["WORLD...

6 months ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

Hi JitteryCoyote63 ! Your clearml-agent is likely ran with python3.9. Can try setting this entry https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L48 in your clearml.conf to python3.8 , or the full path to python3.8 if that doesn't work

2 years ago

0 Some Wierd Bug With Get_Local_Copy? I Have: Dataset C (100X A.Png, 100X B.Json, 100X C.Json) Dataset B (5X A.Png, 5X B.Json, 5X C.Json, 10X New.Json) Dataset A (5X A.Png) When Doing Get_Local_Copy() It Starts Downloading Dataset C Which Is 1.5Tb. This Sh

Hi @<1590514584836378624:profile|AmiableSeaturtle81> , I think you are right. We will try to look into this asap

4 months ago

0 Hey, Can Someone Help With Using Parameter In The Pipeline? According To The Documentation:

Hi @<1726047624538099712:profile|WorriedSwan6> ! At the moment, only the function_kwargs and queue parameters accept such references. We will consider supporting them for other fields as well in the near future

5 months ago

0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

@<1578555761724755968:profile|GrievingKoala83> Looks like something inside NCCL now fails which doesn't allow rank0 to start. are you running this inside a docker container? what is the output of nvidia-smi inside of this container?

6 months ago

0 Hi, After Upgrading To Clearml Sdk 1.6.0, I Am Getting Error When Trying To Work With Google Gcp, Debugging The Code I See This Line In Storagehelper.Check_Write_Permissions :

thank you! we will take a look and come back to you

2 years ago

Yes, see minio instructions under this: None

7 months ago

0 Hey, Is There A Way To Set Pipeline Component Return Artifact Compression At A Pipeline Level ? It Would Allow To Make Big Dataframes Flow Across Component Without Having To Resort To Define Temporary Datasets, Currently It'S Generating Only Raw Pickles.

Hi @<1523702000586330112:profile|FierceHamster54> ! This is currently not possible, but I have a workaround in mind. You could use the artifact_serialization_function parameter in your pipeline. The function should return a bytes stream of the zipped content of your data with whichever compression level you have in mind.
If I'm not mistaken, you wouldn't even need to write a deserialization function in your case, because we should be able to unzip your data just fine.
Wdyt?

11 months ago

0 I Configured S3 Storage In My Clearml.Conf File On A Worker Machine. Then I Run Experiment Which Produced A Small Artifact And It Doesn'T Appear In My Cloud Storage. What Am I Doing Wrong? How To Make Artifacts Appear On My S3 Storage? Below Is A Sample O

check the output_uri parameter in Task.init

one year ago

0 Hi, Working With Clearml 1.6.4 What Is The Correct Way To List All The

Hi OutrageousSheep60 . The list_datasets function is currently broken and will be fixed next release

2 years ago

0 I Uploaded Direct Access File To Clearml Dataset System Like This One. How Can I Access The Link Of The Uploaded Item. Whenever I Try To Call

Hi @<1570583237065969664:profile|AdorableCrocodile14> ! get_local_copy will always copy/download external files to a folder. To get the external files, there is property on the dataset called link_entries which returns a list of LinkEntry objects, which contain a link attribute, and each such link should point to a extrenal file (in this case, your local paths prefixed with file:// )

one year ago

0 Hi, I'M Running

Hi OutrageousSheep60 ! Regarding your questions:
No it's not. We will have a RC that fixes that ASAP, hopefully by tomorrow You can use add_external_files which you already do. If you wish to upload local files to the bucket, you can specify the output_url of the dataset to point the bucket you wish to upload the data to. See the parameter here: https://clear.ml/docs/latest/docs/references/sdk/dataset/#upload . Note that you CAN mix external_files and regular files. We don't hav...

2 years ago

0 Hello, Community, I Hope This Message Finds You All Well. I Am Currently Working On A Project Involving Hyperparameter Optimization (Hpo) Using The Optuna Optimizer. Specifically, I'Ve Been Trying To Navigate The Parameters 'Min_Iteration_Per_Job' And 'M

Hi @<1523703652059975680:profile|ThickKitten19> ! Could you try increasing the max_iteration_per_job and check if that helps? Also, any chance that you are fixing the number of epochs to 10, either through a hyper_parameter e.g. DiscreteParameterRange("General/epochs", values=[10]), or it is simply fixed to 10 when you are calling something like model.fit(epochs=10) ?

8 months ago

0 Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1578555761724755968:profile|GrievingKoala83> ! We have released clearml==1.16.3rc1 which should solve the issue now. Just specify task.launch_multi_node(nodes, devices=gpus) . For example:

import sys
import os
from argparse import ArgumentParser

import pytorch_lightning as pl
from pytorch_lightning.strategies.ddp import DDPStrategy
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from...

6 months ago

0 We Just Had A Slight Problem - There Was A Double Space In S3 Checkpoint Name, But Clearml Ui Prints Them As One In The Model Description. If You Copy And Paste It, The Address Will Be Wrong

Hi DilapidatedDucks58 ! Browsers display double spaces as a single space by default. This is a common problem. What we could do is add a copy to clipboard button (it would copy the text properly). What do you think?

2 years ago

0 Hi Team, I Am Trying To Run A Pipeline Remotely Using Clearml Pipeline And I’M Encountering Some Issues. Could Anyone Please Assist Me In Resolving Them?

what do you get when you run this code?

from clearml.backend_api import Session
print(Session.check_min_api_server_version("2.17"))

11 months ago

0 Https://Clearml.Slack.Com/Archives/Ctk20V944/P1713357955958089

Hi @<1523701949617147904:profile|PricklyRaven28> ! Thank you for the example. We managed to reproduce. We will investigate further to figure out the issue

8 months ago

Show more results