Hi @<1523701713440083968:profile|PanickyMoth78> ! Make sure you are calling Task.init
in my_function
(this is because the bindings made by clearml will be lost in a spawned process as opposed to a forked one). Also make sure that, in the spawned process, you have CLEARML_PROC_MASTER_ID
env var set to the pid of the master process and CLEARML_TASK_ID
to the ID task initialized in the master process (this should happen automatically)
@<1523701949617147904:profile|PricklyRaven28> thank you for the feedback. We will investigate this further
Hi @<1523701949617147904:profile|PricklyRaven28> ! Thank you for the example. We managed to reproduce. We will investigate further to figure out the issue
Hi @<1671689458606411776:profile|StormySeaturtle98> ! Do you have a sample snippet that could help us diagnose this problem?
The config values are not yet documented, but they all default to 10
(except for max_file_size) and represent the number of images/tables/videos etc. that are reported as previews to the dataset. Setting them to 0 disables previewing
To clear the configurations, you should use something like Dataset.list_datasets
to get all the dataset IDs, then something like:
from clearml import Task
id_ = "229f14fe0cb942708c9c5feb412a7ffe"
task = Task.get_task(id_)
original_status = task.s...
Hi @<1618418423996354560:profile|JealousMole49> ! To disable previews, you need to set all of the values below to 0 in clearml.conf
:
dataset.preview.media.max_file_size
dataset.preview.tabular.table_count
dataset.preview.tabular.row_count
dataset.preview.media.image_count
dataset.preview.media.video_count
dataset.preview.media.audio_count
dataset.preview.media.html_count
dataset.preview.media.json_count
Also, I believe you could go through each dataset and remove the `Datase...
have you tried copying the certificate to /usr/local/share/ca-certificates/
?
Hi @<1524560082761682944:profile|MammothParrot39> ! A few thoughts:
You likely know this, but the files may be downloaded to something like /home/user/.clearml/cache/storage_manager/datasets/ds_e0833955ded140a69b4c9c9d8e84986c
. .clearml
may be hidden and if you are using an explorer you are not able to see the directory.
If that is not the issue: are you able to download some other datasets, such as our example one: UrbanSounds example ? I'm wondering if the problem only happens fo...
That makes sense. You should generally have only 1 task (initialized in the master process). The other subprocesses will inherit this task which should speed up the process
Actually, I think you want blop
now that you renamed the project (instead of custom pipeline logic
)
Hi SmallGiraffe94 ! Dataset.squash
doesn't set as parents the ids you specify in dataset_ids
. Also, notice that the current behaviour of squash
is pulling the files from all the datasetes from a temp folder and re-uploading them. How about creating a new dataset with id1, id2, id3
as parents Dataset.create(..., parent_datasets=[id1, id2, id3])
instead? Would this fit your usecase?
Hi DilapidatedDucks58 ! Browsers display double spaces as a single space by default. This is a common problem. What we could do is add a copy to clipboard
button (it would copy the text properly). What do you think?
You could try this in the meantime if you don't mind temporary workarounds:dataset.add_external_files(source_url="
", wildcard=["file1.csv"], recursive=False)
Hi @<1554638160548335616:profile|AverageSealion33> ! We pull git repos to copy the directory your task is running in. Because you deleted .git
, we can't do that anymore. I think that, to fix this, you could just run the agent in the directory .git
previously existed.
@<1654294828365647872:profile|GorgeousShrimp11> Any change your queue is actually named megan-testing
and not megan_testing
?
Hi @<1523703961872240640:profile|CrookedWalrus33> ! The way connect works by default is:
While running locally, all the values (and value changes) of a connected object are sent to the backend.
While running remotely (in your case here), all the values sent in the local run are fetched from the backend and the connected dictionary is populated with these values. The values are readonly, chaning them will not have any effect.
To avoid this behaviour, you could use the `ignore_remote_override...
Hi @<1578555761724755968:profile|GrievingKoala83> ! It looks like lightning uses the NODE_RANK
env var to get the rank of a node, instead of NODE
(which is used by pytorch).
We don't set NODE_RANK
yet, but you could set it yourself after launchi_multi_node
:
import os
current_conf = task.launch_multi_node(2)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
Hope this helps
1 more thing: It's likely that you should do task.launch_multi_node(args.nodes * args.gpus)
instead, as I see that the world size set by lightning corresponds to this value
@<1578555761724755968:profile|GrievingKoala83> does it work properly when gpus=1? Also, what are the values found under Initializing distributed: GLOBAL_RANK: , MEMBER:
in the 2 scenarios, for each task?
@<1578555761724755968:profile|GrievingKoala83> Looks like something inside NCCL now fails which doesn't allow rank0 to start. are you running this inside a docker container? what is the output of nvidia-smi
inside of this container?
does it work running this without clearml? @<1578555761724755968:profile|GrievingKoala83>
because I think that what you are encountering now is an NCCL error
@<1578555761724755968:profile|GrievingKoala83> what error are you getting when using gloo? Is it the same one?
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! This is a known bug, we will likely fix it in the next version
Do you want to remove steps/add steps from the pipeline after it has ran basically? If that is the case, then it is theoretically possible, but we don't expose and methods that would allow you to do that...
What you would need to do is modify all the pipeline configuration entries you find in the CONFIGURATION section (see the screenshot), Not sure if that is worth the effort. I would simply create another version of the pipeline with the added/removed steps
![image](https://clearml-web-asset...
Hi SmugSnake6 ! If you want to delete a project using the APIClient
:from clearml.backend_api.session.client import APIClient from clearml.backend_interface.util import exact_match_regex api_client = APIClient() id = api_client.projects.get_all(name=exact_match_regex("pipeline_project/.pipelines/pipeline_name"), search_hidden=True)[0].id api_client.projects.delete(project=id)
Notice that tasks need to be archived
1.10.2 should be old enough
That would be much appreciated