AgitatedDove14 Hi, So I solve that by passing to the created processes the arguments injected into the argprase as part of the commandline. The examples helped.
AgitatedDove14 It will take me probably a few days but I'll let you know.
PompousBeetle71 let me know if it solves your problem
AgitatedDove14 thanks, I'll check it out.
PompousBeetle71 you can check this example:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_torch_distributed.py
I think it should help, if you want a more manual approach, you can check the POpen subprocesses here:
https://github.com/allegroai/trains/blob/master/examples/distributed/example_subprocess.py
I created a wrapper to work like executing python -m torch.distributed.launch --nproc_per_node 2 ./my_script.py
but from my script. I do call trains.init
in the subprocesses, I the actually difference between the subproceses supposed to be, in terms or arguments, local_rank
that's all.It may be possible and that I'm not distributing the model between the GPUs in an optimal way or at least in a way that matches your framework.
If you have an example it would be great.
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init
in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
AgitatedDove14 I'm using both argpraser and sys.argv to start different processes that each of them will interact with a single GPU. So each process have a specific argument with a different value to differentiate between them. (only the main interact with trains). At the moment I encounter issues with getting the arguments from the processes I spawn. I'm explicitly calling python my_script.py --args...
and each process knows to interact with the other. It's a bit complicated to explain, I'm working with pytorch distributed module..
PompousBeetle71 the code is executed without arguments, in run-time trains / trains-agent will pass the arguments (as defined on the task) to the argparser. This means you that you get the ability to change them and also type checking 🙂
PompousBeetle71 if you are not using argparser how do you parse the arguments from sys.argv? manually?
If that's the case, post parsing, you can connect a dictionary to the Task and you will have the desired behaviortask.connect(dict_with_arguments_from_argv)
SuccessfulKoala55 No, that's not what I mean.
Take a look at the process in the machine:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py
That's how the process starts. Therefore, when I try to get sys.argv
all I get is path/to/script/my_script.py
.
I'm talking about allowing to have arguments that are not being injected to the argparse. So it will look like:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py --num_proc 2
Hi PompousBeetle71 - this can be done using the argparser integration. See https://allegro.ai/docs/examples/examples_tasks/ under "argparse parameters"