Hi @<1532532498972545024:profile|LittleReindeer37> @<1523701205467926528:profile|AgitatedDove14>
I got the session with a bit of "hacking".
See this script:
import boto3, requests, json
from urllib.parse import urlparse
def get_notebook_data():
log_path = "/opt/ml/metadata/resource-metadata.json"
with open(log_path, "r") as logs:
_logs = json.load(logs)
return _logs
notebook_data = get_notebook_data()
client = boto3.client("sagemaker")
response = client.create_...
Hi @<1523721697604145152:profile|YummyWhale40> ! Are you able to upload artifacts of any kind other than models to the CLEARML_DEFAULT_OUTPUT_URI?
You could try this in the meantime if you don't mind temporary workarounds:dataset.add_external_files(source_url="
", wildcard=["file1.csv"], recursive=False)
Hi @<1523701279472226304:profile|SoreHorse95> ! add_external_files
will only stores the links. If the file changes and you don't have a dataset with updated links, I would expect that some caching mechanisms will break, resulting in some files to not be cached/not be downloaded again in the cache after getting the dataset.
Hi @<1587615463670550528:profile|DepravedDolphin12> ! get()
should indeed return a python object. What clearml version are you using? Also, can you share the code?
Hi @<1715900760333488128:profile|ScaryShrimp33> ! You can set the log level by setting the CLEARML_LOG_LEVEL
env var before importing clearml. For example:
import os
os.environ["CLEARML_LOG_LEVEL"] = "ERROR" # or str(logging.CRITICAL/whatever level) also works
Note that the ClearML Monitor
warning is most likely logged to stdout, in which case this message can't really be suppressed, but model upload related message should be
Hi @<1545216070686609408:profile|EnthusiasticCow4> !
So you can inject new command line args that hydra will recognize.
This is true.
However, if you enable _allow_omegaconf_edit_: True, I think ClearML will "inject" the OmegaConf saved under the configuration object of the prior run, overwriting the overrides
This is also true.
Hi @<1570220858075516928:profile|SlipperySheep79> ! What happens if you do this:
import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)
if __name__ == '__main__':
if not Task.current_task():
args = parser.parse_args()
with open(args.config) as f:
config = yaml.load(f, yaml.FullLoader)
run_pipeline(config)
Hi LittleShrimp86 ! Looks like something is broken. We are looking into it
can you share your config? (make sure to remove any credentials)
Could you please try with an older sdk version just to make sure there were no regressions?
or, if you want the steps to be ran by the agent, set run_pipeline_steps_locally=False
Hi @<1555000563244994560:profile|OutrageousSealion55> ! How do you pass base_task_id
in the HyperParamterOptimizer
?
Hi @<1523703652059975680:profile|ThickKitten19> ! Could you try increasing the max_iteration_per_job
and check if that helps? Also, any chance that you are fixing the number of epochs to 10, either through a hyper_parameter e.g. DiscreteParameterRange("General/epochs", values=[10]),
or it is simply fixed to 10 when you are calling something like model.fit(epochs=10)
?
ok, that is very useful actually
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! Having tqdm installed in your environment might help
The {$step.id}
is the most viable way to reference that step imo @<1633638724258500608:profile|BitingDeer35>
@<1719524641879363584:profile|ThankfulClams64> you could try using the compare function in the UI to compare the experiments on the machine the scalars are not reported properly and the experiments on a machine that runs the experiments properly. I suggest then replicating the environment exactly on the problematic machine. None
With that said, can I run another thing by you related to this. What do you think about a PR that adds the functionality I originally assumed schedule_function was for? By this I mean: adding a new parameter (this wouldn't change anything about schedule_function or how .add_task() currently behaves) that also takes a function but the function expects to get a task_id when called. This function is run at runtime (when the task scheduler would normally execute the scheduled task) and use ...
Hi @<1570583237065969664:profile|AdorableCrocodile14> ! get_local_copy
will always copy/download external files to a folder. To get the external files, there is property on the dataset called link_entries
which returns a list of LinkEntry
objects, which contain a link
attribute, and each such link should point to a extrenal file (in this case, your local paths prefixed with file://
)
OutrageousSheep60 that is correct, each dataset is in a different subproject. That is why bug 2.
happens as well
FierceHamster54initing the task before the execution of the file like in my snippet is not sufficient ?
It is not because os.system
spawns a whole different process then the one you initialized your task in, so no patching is done on the framework you are using. Child processes need to call Task.init
because of this, unless they were forked, in which case the patching is already done.
` But the training.py has already a CLearML task created under the hood since its integratio...
Hi BoredBat47 ! What jsonschema
version are you using?
Hi @<1657918706052763648:profile|SillyRobin38> ! If it is compatible with http/rest, you could try setting api.files_server
to the endpoint or sdk.storage.default_output_uri
in clearml.conf
(depending on your use-case).
The only expection is the models if I'm not mistaken, which are stored locally by default.
@<1675675705284759552:profile|NonsensicalAnt77> have you tried setting secure: true
and host: storage.yandexcloud.net:443
?
Hi RoughTiger69 ! Can you try adding the files using a python script such that we could get an exception traceback, something like this:
` from clearml import Dataset
or just use the ID of the dataset you previously created instead of creating a new one
parent_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ")
parent_dataset.add_files("folder1")
parent_dataset.upload()
parent_dataset.finalize()
child_dataset = Dataset.create(dataset_name="xxxx", dat...
Hi HomelyShells16 How about doing things this way? does it work for you?
` class ClearmlLightningCLI(LightningCLI):
def init(self, *args, **kwargs):
Task.add_requirements("requirements.txt")
self.task = Task.init(
project_name="example",
task_name="pytorch_lightning_jsonargparse",
)
super().init(*args, **kwargs)
def instantiate_classes(self, *args, **kwargs):
super().instantiate_classes(*args, **kwargs)
...