Try examples/.pipelines/custom pipeline logic instead of pipeline_project/.pipelines/custom pipeline logic
Hi @<1603198163143888896:profile|LonelyKangaroo55> ! Each pipeline component runs in a task. So you first need the IDEs of each component you try to query. The you can use Task.get_task None to get the task object, the you can use Task,get_status to get the status None .
To get the ids, you can use something like [None](https://clear.ml/docs/...
Indeed, running pipelines that were started with pipe.start_locally can not be cloned and ran. We will change this behaviour ASAP such that you can use just 1 queue for your use case.
Hi @<1569496075083976704:profile|SweetShells3> ! Can you reply with some example code on how you tried to use pl.Trainer with launch_multi_node ?
How about if Task.running_locally(): ?
What if you add images to the dataset? Can you see them being previewed? @<1523701168822292480:profile|ExuberantBat52>
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! I have an idea.
The flow would be like this: you create a dataset, the parent of that dataset would be the previously created dataset. The version will auto-bump. Then, you sync this dataset with the folder. Note that sync will return the number of added/modified/removed files. If all of these are 0, then you use Dataset.delete on this dataset and break/continue, else you upload and finalize the dataset.
Something like:
parent =...
Hi @<1523701345993887744:profile|SillySealion58> ! We allow finer grained control over model uploads. Please refer to this GH thread for an example on how to achieve that: None
UnevenDolphin73 The task shouldn't disappear when using use_current_task=False . There might be something else that makes it disappear.
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! Can you please try with clearml==1.13.3rc0 ? I believe we fixed this issue
I think we need to set more env var if we are running with multiple gpus on 1 node.
Can you try setting:
os.environ["NODE_RANK"] = current_conf["node_rank"] // gpus
os.environ["LOCAL_RANK"] = current_conf["node_rank"] % gpus
os.environ["GLOBAL_RANK"] = current_conf["node_rank"]
MammothParrot39 try to set this https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L82 in your clearml.conf to "22.3.1"
Hi @<1523701240951738368:profile|RoundMosquito25> ! Try using this function None
Don't call PipelineController functions after start has finished. Use a post_execute_callback instead
` from clearml import PipelineController
def some_step():
return
def upload_model_to_controller(controller, node):
print("Start uploading the model")
if name == "main":
pipe = PipelineController(name="Yolo Pipeline Controller", project="yolo_pipelines", version="1.0.0")
pipe.add_function_step(
name="some_step",
function=some_st...
Then there is likely a problem with those tasks. For example, could be that the hyper parameters get values that are too low or high which just bugs out the training.
Hi @<1633638724258500608:profile|BitingDeer35> ! Looks like the SDK doesn't currently allow to create steps/controllers with a designated cwd. You will need to call the set_script function on your step's tasks and on the controller for now.
For the controller: If you are using the PipelineDecorator, you can do something like: PipelineDecorator._singleton._task.set_script(working_dir="something") , before you are running the pipeline function. In the case of regular `PipelineControll...
ok, that is very useful actually
is it just this script that you are running that breaks? What happens if instead of pipe.upload_model you callprint(pipe._get_pipeline_task())?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! add_files already uses multi-threading, so threads would not help (see the max_workers argument).
If you are using a cloud provider such as s3 it would be useful setting this argument, or look for config entries in clearml.conf that would speed-up the upload (such as aws.s3.boto3.max_multipart_concurrency )
Hi SmugSnake6 . clearml==1.7.1 should support empty lists
or, if you want the steps to be ran by the agent, set run_pipeline_steps_locally=False
Hi @<1610083503607648256:profile|DiminutiveToad80> ! You need to somehow serialize the object. Note that we try different serialization methods and default to pickle if none work. If pickle doesn't work then the artifact can't be uploaded by default. But there is a way around it: you can serialize the object yourself. The recommended way to do this is using the serialization_function argument in upload_artifact . You could try using something like dill which can serialize more ob...
Hi @<1578555761724755968:profile|GrievingKoala83> ! Can you share the logs after setting NCCL_DEBUG=INFO of all the tasks? Also, did it work for you 5 months ago because you were on another clearml version? If it works with another version, can you share that version number?
Hi @<1524560082761682944:profile|MammothParrot39> ! A few thoughts:
You likely know this, but the files may be downloaded to something like /home/user/.clearml/cache/storage_manager/datasets/ds_e0833955ded140a69b4c9c9d8e84986c . .clearml may be hidden and if you are using an explorer you are not able to see the directory.
If that is not the issue: are you able to download some other datasets, such as our example one: UrbanSounds example ? I'm wondering if the problem only happens fo...
Hi @<1523721697604145152:profile|YummyWhale40> ! Are you able to upload artifacts of any kind other than models to the CLEARML_DEFAULT_OUTPUT_URI?
hi QuaintJellyfish58 ! How does your clearml.conf look like? How do you run minio ? Can you download files using boto3 rather than clearml ? Could you provide a script that could help us reproduce the issue?