Hi PompousBeetle71 - this can be done using the argparser integration. See https://allegro.ai/docs/examples/examples_tasks/ under "argparse parameters"
SuccessfulKoala55 No, that's not what I mean.
Take a look at the process in the machine:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py
That's how the process starts. Therefore, when I try to get sys.argv
all I get is path/to/script/my_script.py
.
I'm talking about allowing to have arguments that are not being injected to the argparse. So it will look like:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py --num_proc 2
PompousBeetle71 the code is executed without arguments, in run-time trains / trains-agent will pass the arguments (as defined on the task) to the argparser. This means you that you get the ability to change them and also type checking 🙂
PompousBeetle71 if you are not using argparser how do you parse the arguments from sys.argv? manually?
If that's the case, post parsing, you can connect a dictionary to the Task and you will have the desired behaviortask.connect(dict_with_arguments_from_argv)
AgitatedDove14 I'm using both argpraser and sys.argv to start different processes that each of them will interact with a single GPU. So each process have a specific argument with a different value to differentiate between them. (only the main interact with trains). At the moment I encounter issues with getting the arguments from the processes I spawn. I'm explicitly calling python my_script.py --args...
and each process knows to interact with the other. It's a bit complicated to explain, I'm working with pytorch distributed module..
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init
in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
I created a wrapper to work like executing python -m torch.distributed.launch --nproc_per_node 2 ./my_script.py
but from my script. I do call trains.init
in the subprocesses, I the actually difference between the subproceses supposed to be, in terms or arguments, local_rank
that's all.It may be possible and that I'm not distributing the model between the GPUs in an optimal way or at least in a way that matches your framework.
If you have an example it would be great.
PompousBeetle71 you can check this example:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_torch_distributed.py
I think it should help, if you want a more manual approach, you can check the POpen subprocesses here:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_subprocess.py
AgitatedDove14 thanks, I'll check it out.
PompousBeetle71 let me know if it solves your problem
AgitatedDove14 It will take me probably a few days but I'll let you know.
AgitatedDove14 Hi, So I solve that by passing to the created processes the arguments injected into the argprase as part of the commandline. The examples helped.