Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! To help us debug this: are you able to simply use the boto3 python package to interact with your cluster?
If so, how does that code look like? This would give us some insight on how the config should actually look like or what changes need to be made.
Hi @<1578555761724755968:profile|GrievingKoala83> ! It looks like lightning uses the NODE_RANK env var to get the rank of a node, instead of NODE (which is used by pytorch).
We don't set NODE_RANK yet, but you could set it yourself after launchi_multi_node :
import os
current_conf = task.launch_multi_node(2)
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
Hope this helps
@<1634001100262608896:profile|LazyAlligator31> it looks like the args get passed to a python thread. so the should be specified the same way as you would pass them to the args argument in a thread (so a tuple of positional arguments): func_args=("something", "else") . It looks like passing kwargs is not directly supported, but you could build a partial :
from functools import partial
scheduler.add_task(schedule_function=partial(clone_enqueue, arg_1="something", arg_2="else")...
Hi HandsomeGiraffe70 ! You could try setting dataset.preview.tabular.table_count to 0 in your clearml.conf file
Hi @<1539417873305309184:profile|DangerousMole43> ! You need to mark the task you want to upload an artifact to as running. You can use task.mark_started(force=True) to do so, then mark it back as completed using task.mark_completed(force=True)
Hi PetiteRabbit11 . This snippet works for me:
` from clearml import Task
from pathlib2 import Path
t = Task.init()
config = t.connect_configuration(Path("config.yml"))
print(open(config).read()) Note the you need to use the return value of connect_configuration ` when you open the configuration file
Hi GiganticMole91 . You could use something like
` from clearml.automation import DiscreteParameterRange
HyperParameterOptimizer(
...,
hyper_parameters=[DiscreteParameterRange("epochs", values=[100]), ...] # epochs is static, ... represent the other params
) to get the same behaviour --params-override ` provides
@<1526734383564722176:profile|BoredBat47> Yeah. This is an example:
s3 {
key: "mykey"
secret: "mysecret"
region: "us-east-1"
credentials: [
{
bucket: "
"
key: "mykey"
secret: "mysecret"
region: "us-east-1"
},
]
}
# some other config
default_output_uri: "
"
Hi @<1533257278776414208:profile|SuperiorCockroach75> Try setting packages in your pipline component to your requirements.txt or simply add the list of packages (with the specific versions). None
Hi @<1570220858075516928:profile|SlipperySheep79> ! What happens if you do this:
import yaml
import argparse
from my_pipeline.pipeline import run_pipeline
from clearml import Task
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True)
if __name__ == '__main__':
if not Task.current_task():
args = parser.parse_args()
with open(args.config) as f:
config = yaml.load(f, yaml.FullLoader)
run_pipeline(config)
@<1554638160548335616:profile|AverageSealion33> looks like hydra pulls the config relative to the scripts directory, and not the current working directory. The pipeline controller actually creates a temp file in /tmp when it pulls the step, so the script's directory will be /tmp and when searching for ../data , hydra will search in / . The .git likely caused your repository to be pulled, so your repo structure was created in /tmp , which caused the step to run correctly...
@<1590514584836378624:profile|AmiableSeaturtle81> note that we zip the files before uploading them as artifacts to the dataset task. Any chance you are specifying the default output uri as being a local path, such as /tmp ?
@<1523701083040387072:profile|UnevenDolphin73> are you composing the code you want to execute remotely by copy pasting it from various cells in one standalone cell?
hi OutrageousSheep60 ! We didn't release an RC yet, we will a bit later today tho. We will ping you when it's ready, sorry for the delay
thank you! we will take a look and come back to you
Regarding pending pipelines: please make sure a free agent is bound to the queue you wish to run the pipeline in. You can check queue information by accessing the INFO section of the controller (as in the first screenshort)
then by pressing on the queue, you should see the worker status. There should be at least one worker that has a blank "CURRENTLY EXECUTING" entry

So the flow is like:MASTER PROCESS -> (optional) calls task.init -> spawns some children CHILD PROCESS -> calls Task.init. The init is deferred even tho it should not be?
If so, we need to fix this for sure
Hi UnevenDolphin73 ! We were able to reproduce the issue. We'll ping you once we have a fix as well 👍
HomelyShells16 looks like some changes have been made to jsonargparse and pytorch_lightning since we released this binding feature. could you try with jsonargparse==3.19.4 and pytorch_lightning==1.5.0 ? (no namespace parsing hack should be needed with these versions I believe)
What OS are you running the scripts on, Abed?