you might want to prefix both the host
in the configuration file and the uri in Task.init
/ StorageHelper.get
with s3.
if the script above works if you do that
Regarding 1.
, are you trying to delete the project from the UI? (I can't see an attached image in your message)
OutrageousSheep60 that is correct, each dataset is in a different subproject. That is why bug 2.
happens as well
Regarding number 2.
, that is indeed a bug and we will try to fix it as soon as possible
UnevenDolphin73 Yes it makes sense. At the moment, this is not possible. When using use_current_task=True
the task gets attached to the dataset and moved under dataset_project/.datasets/dataset_name
. Maybe we could make the task not disappear from its original project in the near future.
I don't think the version makes the task disappear. You should still see the task in the Datasets
section. Maybe there is something you do with that task/dataset that makes it disappear (even tho it shouldn't)?
Hi @<1715175986749771776:profile|FuzzySeaanemone21> ! Are you running this remotely? If so, you should work inside a repository such that the agent can clone the repository which should include the config as well. Otherwise, the script will run as a "standalone"
Hi @<1643060801088524288:profile|HarebrainedOstrich43> ! Could you please share some code that could help us reproduced the issue? I tried cloning, changing parameters and running a decorated pipeline but the whole process worked as expected for me.
Hi BoredHedgehog47 ! We tried to reproduce this, but failed. What we tried is running the attached main.py
which Popen
s sub.py
.
Can you please run main.py
as well and tell us if you still encounter the bug? If not, is there anything else you can think of that could trigger this bug besides creating a subprocess?
Thank you!
If the task is running remotely and the parameters are populated, then the local run parameters will not be used, instead the parameters that are already on the task will be used. This is because we want to allow users to change these parameters in the UI if they want to - so the paramters that are in the code are ignored in the favor of the ones in the UI
Hi @<1719524641879363584:profile|ThankfulClams64> ! What tensorflow/keras version are you using? I noticed that in the TensorBoardImage
you are using tf.Summary
which no longer exists since tensorflow 2.2.3
, which I believe is too old to work with tesorboard==2.16.2.
Also, how are you stopping and starting the experiments? When starting an experiment, are you resuming training? In that case, you might want to consider setting the initial iteration to the last iteration your prog...
FiercePenguin76 Are you changing the model by pressing the circled button in the first photo? Are you promted with a menu like in the second photo?
Hi @<1694157594333024256:profile|DisturbedParrot38> ! We weren't able to reproduce, but you could find the source of the warning by appending the following code at the top of your script:
import traceback
import warnings
import sys
def warn_with_traceback(message, category, filename, lineno, file=None, line=None):
log = file if hasattr(file,'write') else sys.stderr
traceback.print_stack(file=log)
log.write(warnings.formatwarning(message, category, filename, lineno, line))
...
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! I have an idea.
The flow would be like this: you create a dataset, the parent of that dataset would be the previously created dataset. The version will auto-bump. Then, you sync this dataset with the folder. Note that sync will return the number of added/modified/removed files. If all of these are 0, then you use Dataset.delete
on this dataset and break/continue, else you upload and finalize the dataset.
Something like:
parent =...
Hi @<1523701345993887744:profile|SillySealion58> ! We allow finer grained control over model uploads. Please refer to this GH thread for an example on how to achieve that: None
Hi @<1578555761724755968:profile|GrievingKoala83> ! It looks like lightning uses the NODE_RANK
env var to get the rank of a node, instead of NODE
(which is used by pytorch).
We don't set NODE_RANK
yet, but you could set it yourself after launchi_multi_node
:
import os
current_conf = task.launch_multi_node(2)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
Hope this helps
SmallGiraffe94 You should use dataset_version=2022-09-07
(not version=...
). This should work for your use-case.Dataset.get
shouldn't actually accept a version
kwarg, but it does because it accepts some **kwargs
used internally. We will make sure to warn the users in case they pass values to **kwargs
from now on.
Anyway, this issue still exists, but in another form:Dataset.get
can't get datasets with a non-semantic version, unless the version is sp...
Hi @<1715900760333488128:profile|ScaryShrimp33> ! You can set the log level by setting the CLEARML_LOG_LEVEL
env var before importing clearml. For example:
import os
os.environ["CLEARML_LOG_LEVEL"] = "ERROR" # or str(logging.CRITICAL/whatever level) also works
Note that the ClearML Monitor
warning is most likely logged to stdout, in which case this message can't really be suppressed, but model upload related message should be
Something like:
dataset = Dataset.create(dataset_name=dataset_name, dataset_porject=dataset_project, parent_datasets=[dataset.id])
That is a clear bug to me. Can you please open a GH issue?
@<1675675705284759552:profile|NonsensicalAnt77> Can you try using None to setup the credentials? Maybe there is an issue parsing/finding the conf file
UnevenDolphin73 did that fix the logging for you? doesn't seem to work on my machine. This is what I'm running:
` from clearml import Task
import logging
def setup_logging():
level = logging.DEBUG
logging_format = "[%(levelname)s] %(asctime)s - %(message)s"
logging.basicConfig(level=level, format=logging_format)
t = Task.init()
setup_logging()
logging.info("HELLO!")
t.close()
logging.info("HELLO2!") `
UnevenDolphin73 looking at the code again, I think it is actually correct. it's a bit hackish, but we do use deferred_init
as an int internally. Why do you need to close the task exactly? Do you have a script that would highlight the behaviour change between <1.8.1
and >=1.8.1
?
Hi @<1693795212020682752:profile|ClumsyChimpanzee88> ! Not sure I understand the question. If the commit ID does not exist remotely, then it can't be pulled. How would you pull the commit to another machine otherwise, is this possible using your current workflow?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! You could get the Dataset Struct
configuration object and get the job_size
from there, which is the dataset size in bytes. The task IDs of the datasets are the same as the datasets' IDs by the way, so you can call all the clearml task related function on the task your get by doing Task.get_task("dataset_id")