@<1566596968673710080:profile|QuaintRobin7> not for now. Could you please open a GH issue about it? Maybe we can fit this in a future patch.
@<1578555761724755968:profile|GrievingKoala83> what error are you getting when using gloo? Is it the same one?
Hi @<1702492411105644544:profile|YummyGrasshopper29> ! Parameters can belong to different sections. You should append it before some_parameter
. You likely want ${step2.parameters.kwargs/some_parameter}
PanickyMoth78 Something is definitely wrong here. The fix doesn't seem to be trivial as well... we will prioritize this for the next version
Hi @<1523701949617147904:profile|PricklyRaven28> ! Thank you for the example. We managed to reproduce. We will investigate further to figure out the issue
Hi @<1695969549783928832:profile|ObedientTurkey46> ! You could try increasing sdk.storage.cache.default_cache_manager_size
to a very large number
OutrageousSheep60 that is correct, each dataset is in a different subproject. That is why bug 2.
happens as well
Yes it should word with ClearML if it works with requests
you could also try using gloo
as the backend (it uses CPU) just to check that the subprocesses spawn properly
Hi @<1603198134261911552:profile|ColossalReindeer77> ! The usual workflow is that you modify the fields in your remoter run in either the Hyperparameters section or the configuration section, but not usually both (as in Hydra's case). When using CLI tools, people mostly modify the Hyperparameters section so we chose to set the allow_omegaconf_edit to False by default for parity.
@<1523721697604145152:profile|YummyWhale40> are you able to manually save models from SageMaker using OutputModel
? None
Hi PanickyMoth78 ! This will likely not make it into 1.9.0 (this will be the next version we release, most likely before Christmas). We will try to get the fix out in 1.9.1
or rather than str(self)
, something like:
def __repr__(self):
return self.__class__.__name__ + "." + self.name
should work better
That makes sense. You should generally have only 1 task (initialized in the master process). The other subprocesses will inherit this task which should speed up the process
1.10.2 should be old enough
Hi HandsomeGiraffe70 ! You could try setting dataset.preview.tabular.table_count
to 0 in your clearml.conf
file
Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?
@<1545216070686609408:profile|EnthusiasticCow4>
This:
parent = self.clearml_dataset = Dataset.get(
dataset_name="[LTV] Dataset",
dataset_project="[LTV] Lifetime Value Model",
)
# generate the local dataset
dataset = Dataset.create(
dataset_name=f"[LTV] Dataset",
parent_datasets=[parent],
dataset_project="[LTV] Lifetime Value Model",
)
should l...
Hi @<1566596968673710080:profile|QuaintRobin7> ! Sometimes, ClearML is not capable of transforming matplotlib plots to plotly , so we report the plot as an image to Debug Samples. Looks like report_interactive=True
makes the plot unparsable
Hi @<1578918167965601792:profile|DistinctBeetle43> ! This is currently not possible. A different task will be created for each instance
UnevenDolphin73 Yes it makes sense. At the moment, this is not possible. When using use_current_task=True
the task gets attached to the dataset and moved under dataset_project/.datasets/dataset_name
. Maybe we could make the task not disappear from its original project in the near future.
You could consider downgrading to something like 1.7.1 in the meantime, it should work with that version
Hi @<1523701345993887744:profile|SillySealion58> ! We allow finer grained control over model uploads. Please refer to this GH thread for an example on how to achieve that: None
@<1578555761724755968:profile|GrievingKoala83> Looks like something inside NCCL now fails which doesn't allow rank0 to start. are you running this inside a docker container? what is the output of nvidia-smi
inside of this container?
can you send the full logs of rank0 and rank1 tasks?
Hi @<1709015393701466112:profile|ScatteredPeacock14> ! I think you are right. We are going to look into fixing this
Hi @<1581454875005292544:profile|SuccessfulOtter28> ! You could take a look at how the HPO was built using optuna: None .
Basically: you should create a new class which inherits from SearchStrategy
. This class should convert clearml hyper_parameters to some parameters the Ray Tune understands, then create a Tuner
and run the Ray Tune hyper paramter optimization.
The function Tuner
will optim...
Hi @<1654294828365647872:profile|GorgeousShrimp11> ! add_tags
is an instance method, so you will need the controller instance to call it. To get the controller instance, you can do PipelineDecorator.get_current_pipeline()
then call add_tags
on the returned value. So: PipelineDecorator.get_current_pipeline().add_tags(tags=["tag1", "tag2"])
FiercePenguin76 Are you changing the model by pressing the circled button in the first photo? Are you promted with a menu like in the second photo?
Hi @<1715900760333488128:profile|ScaryShrimp33> ! You can set the log level by setting the CLEARML_LOG_LEVEL
env var before importing clearml. For example:
import os
os.environ["CLEARML_LOG_LEVEL"] = "ERROR" # or str(logging.CRITICAL/whatever level) also works
Note that the ClearML Monitor
warning is most likely logged to stdout, in which case this message can't really be suppressed, but model upload related message should be