Reputation
Badges 1
49 × Eureka!AgitatedDove14 yes, you're right. it was 10.2 or 10.1 if I recall.
AgitatedDove14 I'm using both argpraser and sys.argv to start different processes that each of them will interact with a single GPU. So each process have a specific argument with a different value to differentiate between them. (only the main interact with trains). At the moment I encounter issues with getting the arguments from the processes I spawn. I'm explicitly calling python my_script.py --args... and each process knows to interact with the other. It's a bit complicated to explai...
AgitatedDove14 Hi, So I solve that by passing to the created processes the arguments injected into the argprase as part of the commandline. The examples helped.
AgitatedDove14 It will take me probably a few days but I'll let you know.
AgitatedDove14 I've tried the drastic measure suggested above as I had a log file of 1gb filled with the trains.frameworks - WARNING - Could not retrieve model location, skipping auto model logging
It didn't work :S
I use torch and yes, I use save so your code will catch it.
SuccessfulKoala55 No, that's not what I mean.
Take a look at the process in the machine:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py
That's how the process starts. Therefore, when I try to get sys.argv all I get is path/to/script/my_script.py .
I'm talking about allowing to have arguments that are not being injected to the argparse. So it will look like:
` /home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py --...
AgitatedDove14 I can't try the new agent at the moment, the OS is Ubuntu 18.04 more specifically: amazon/Deep Learning Base AMI (Ubuntu 18.04) Version 22.0 and no docker. Running on the machine.
AgitatedDove14 I'm using that code in the meanwhile
` ### This script checks the number of GPUs, create a list like 0,1,2...
Then adds '--gpus' before that list of GPUs
NUM_GPUS=nvidia-smi -L | wc -l
NUM_GPUS=$(($NUM_GPUS-1))
OUT=()
if [ $NUM_GPUS -ge 0 ]
then
for i in $(seq 0 $NUM_GPUS); do OUT+=( "$i" ); done
echo ${OUT[*]// /|} | tr ' ' ',' | awk '{print "--gpus "$1}'
else
echo ""
fi `
AgitatedDove14
These were the loggers names I can see locally running the code, it might differ running remotely.
['trains.utilities.pyhocon.config_parser', 'trains.utilities.pyhocon', 'trains.utilities', 'trains', 'trains.config', 'trains.storage', 'trains.metrics', 'trains.Repository Detection']
regarding repreduce it, have a long data processing after initializing the task and before setting the input model/output model.
I actually tried to print the logging.getLogger("trains.frameworks").level and it was ERROR as expected. Therefore I'm not quite sure that's the problem... next I thought to patch your functions.
the solution that worked: [logging.getLogger(name).setLevel(logging.ERROR) for name in logging.root.manager.loggerDict if "trains" in name]
AgitatedDove14 Well, after starting a new project it works. I guess it's a bug.
AgitatedDove14 Drastic indeed, I belive I will lose all the trains logs that way. In that case I prefer to keep the redundant logs.
If you'll find a more specific solution I'll love to know what it is 🙂
I think it's either parent project or parent experiment you don't need both.
AgitatedDove14 Thanks Martin, I know that. I just say it's a bug.
AgitatedDove14 Yes, I can. I didn't delete the previous project yet.
the version of the agent (the worker that received the job was 0.14.1)
the one that created the template was 0.14.2
I think if there's a default value it should override the type, otherwise go with the type
yes, there's a use for empty strings, for example in text generation you may generate the next word given some prefix, the prefix may be an empty string.
the trains version is still 0.14 it will take time to switch it
AgitatedDove14 v0.14
AgitatedDove14 When the default is None I expect the default value to be None even if the type is str. But I'll use your recommendation 🙂
AgitatedDove14 no, there's no reason in my case to pass an empty string. that's why I removed the type=str part.
I thought to change to connected ditionary though.