PompousBeetle71 you can check this example:
I think it should help, if you want a more manual approach, you can check the POpen subprocesses here:
I created a wrapper to work like executing
python -m torch.distributed.launch --nproc_per_node 2 ./my_script.py but from my script. I do call
trains.init in the subprocesses, I the actually difference between the subproceses supposed to be, in terms or arguments,
local_rank that's all.It may be possible and that I'm not distributing the model between the GPUs in an optimal way or at least in a way that matches your framework.
If you have an example it would be great.
AgitatedDove14 I'm using both argpraser and sys.argv to start different processes that each of them will interact with a single GPU. So each process have a specific argument with a different value to differentiate between them. (only the main interact with trains). At the moment I encounter issues with getting the arguments from the processes I spawn. I'm explicitly calling
python my_script.py --args... and each process knows to interact with the other. It's a bit complicated to explain, I'm working with pytorch distributed module..
PompousBeetle71 the code is executed without arguments, in run-time trains / trains-agent will pass the arguments (as defined on the task) to the argparser. This means you that you get the ability to change them and also type checking 🙂
PompousBeetle71 if you are not using argparser how do you parse the arguments from sys.argv? manually?
If that's the case, post parsing, you can connect a dictionary to the Task and you will have the desired behavior
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call
trains.init in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
SuccessfulKoala55 No, that's not what I mean.
Take a look at the process in the machine:
/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py
That's how the process starts. Therefore, when I try to get
sys.argv all I get is
I'm talking about allowing to have arguments that are not being injected to the argparse. So it will look like:
/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py --num_proc 2