or, if you want the steps to be ran by the agent, set run_pipeline_steps_locally=False
@<1578555761724755968:profile|GrievingKoala83> did you call task.aunch_multi_node(4)
or 2
? I think the right value is 4 in this case
Hi @<1555000563244994560:profile|OutrageousSealion55> ! How do you pass base_task_id
in the HyperParamterOptimizer
?
Is there any way to look at all the tasks that used that version of the dataset?
Not easily. You could query the runtime properties of all tasks and check for datasets used.
But what I would do is tag the task that uses a certain dataset, and then you should be able to query by tags
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! This is actually very weird. Does your pipeline fail when running the first step? What if you run the pipeline via "raw" python (i.e. by doing python3 your_script.py
)?
Hi FlutteringWorm14 ! Looks like we indeed don't wait for report_period_sec
when reporting data. We will fix this in a future release. Thank you!
Can you please update it to the latest version? pip install -U jsonschema
MammothParrot39 try to set this https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L82 in your clearml.conf
to "22.3.1"
Oh I see what you mean. start
will enqueue the pipeline, in order for it to be ran remotely by an agent. I think that what you want to call is pipe.start_locally(run_pipeline_steps_locally=True)
(and get rid of the wait
).
Hi @<1523715429694967808:profile|ThickCrow29> ! We identified the issue. We will soon release a fix for it
this is likely an UI bug. We should have a fix soon. In the meantime, yes, you can edit the configuration under the pipeline task to achieve the same effect
Hi @<1724235687256920064:profile|LonelyFly9> ! ClearML does not allow for those to be configured, but you might consider setting AWS_RETRY_MODE and AWS_MAX_ATTEMPTS env vars. Docs from boto3: None
Hi @<1570220858075516928:profile|SlipperySheep79> ! What happens if you do this:
import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)
if __name__ == '__main__':
if not Task.current_task():
args = parser.parse_args()
with open(args.config) as f:
config = yaml.load(f, yaml.FullLoader)
run_pipeline(config)
basically, I think that the pipeline run starts from __
main_
_
and not the pipeline function, which causes the file to be read
How about if Task.running_locally():
?
it's the same file you added your s3 creds to
@<1654294828365647872:profile|GorgeousShrimp11> Any change your queue is actually named megan-testing
and not megan_testing
?
Hi @<1719162259181146112:profile|ShakySnake40> ! It looks like you are trying to update an already finalized dataset. Datasets that are finalized cannot be updated. In general, you should create a new dataset that inherits from the dataset you want to update (via the parent_datasets
argument in Dataset.create
) and operate on that dataset instead
Hi @<1545216070686609408:profile|EnthusiasticCow4> ! I have an idea.
The flow would be like this: you create a dataset, the parent of that dataset would be the previously created dataset. The version will auto-bump. Then, you sync this dataset with the folder. Note that sync will return the number of added/modified/removed files. If all of these are 0, then you use Dataset.delete
on this dataset and break/continue, else you upload and finalize the dataset.
Something like:
parent =...
Hi @<1578555761724755968:profile|GrievingKoala83> ! It looks like lightning uses the NODE_RANK
env var to get the rank of a node, instead of NODE
(which is used by pytorch).
We don't set NODE_RANK
yet, but you could set it yourself after launchi_multi_node
:
import os
current_conf = task.launch_multi_node(2)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
Hope this helps
@<1578555761724755968:profile|GrievingKoala83> does it work properly when gpus=1? Also, what are the values found under Initializing distributed: GLOBAL_RANK: , MEMBER:
in the 2 scenarios, for each task?
Each step is a separate task, with its own separate logger. You will not be able to reuse the same logger. Instead, you should get the logger in the step you want to use it calling current_logger
Hi @<1546303293918023680:profile|MiniatureRobin9> The PipelineController
has a property called id
, so just doing something like pipeline.id
should be enough
Could you try adding region
under credentials
as well?
Hi PanickyMoth78 ! This will likely not make it into 1.9.0 (this will be the next version we release, most likely before Christmas). We will try to get the fix out in 1.9.1
Hi @<1523701345993887744:profile|SillySealion58> ! We allow finer grained control over model uploads. Please refer to this GH thread for an example on how to achieve that: None
Hi @<1523701868901961728:profile|ReassuredTiger98> ! Looks like the task actually somehow gets ran by both an agent and locally at the same time, so one of the is aborted. Any idea why this might happen?
There might be something wrong with the agent using ubuntu:22.04
. Anyway, good to know everything works fine now