
Reputation
Badges 1
49 × Eureka!TimelyPenguin76 the tags names are 'Epoch 1', 'Step 5705'
the return value of the InputModel(<Put a string copy from the UI with the tag id>).tags
is an empty array.
Yeah, I thought to use artifact, wondered if I can avoid using it or on the other hand, use only it just to define the "the model" as a folder.
Thanks.
SuccessfulKoala55 No, that's not what I mean.
Take a look at the process in the machine:/home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py
That's how the process starts. Therefore, when I try to get sys.argv
all I get is path/to/script/my_script.py
.
I'm talking about allowing to have arguments that are not being injected to the argparse. So it will look like:
` /home/ubuntu/.trains/venvs-builds/3.6/bin/python -u path/to/script/my_script.py --...
AgitatedDove14 Well, after starting a new project it works. I guess it's a bug.
yes, there's a use for empty strings, for example in text generation you may generate the next word given some prefix, the prefix may be an empty string.
AgitatedDove14 no, there's no reason in my case to pass an empty string. that's why I removed the type=str
part.
I've solved the first part by importing trains after parsing the arguments. Still not sure about the second part of my question.
AgitatedDove14 I'm using both argpraser and sys.argv to start different processes that each of them will interact with a single GPU. So each process have a specific argument with a different value to differentiate between them. (only the main interact with trains). At the moment I encounter issues with getting the arguments from the processes I spawn. I'm explicitly calling python my_script.py --args...
and each process knows to interact with the other. It's a bit complicated to explai...
SteadyFox10 AgitatedDove14 Thanks, I really did change the name.
TimelyPenguin76 yes, both 0.15.1
AgitatedDove14 You were right. I can get them as system tags.
I've wrote a class that wraps an training session and interaction with trains as upon loading/saving the experiment I need more than just the 'model.bin'
So I use these tags to match a specific aux files that were saved with their model.
I actually tried to print the logging.getLogger("trains.frameworks").level
and it was ERROR as expected. Therefore I'm not quite sure that's the problem... next I thought to patch your functions.
AgitatedDove14 I've tried the drastic measure suggested above as I had a log file of 1gb filled with the trains.frameworks - WARNING - Could not retrieve model location, skipping auto model logging
It didn't work :S
AgitatedDove14 My solution actually works better when I want to copy the model + aux to a different s3 folder for deployment as the aux is very light and I can copy the model without downloading it. But thanks for the suggestion.
I created a wrapper to work like executing python -m torch.distributed.launch --nproc_per_node 2 ./my_script.py
but from my script. I do call trains.init
in the subprocesses, I the actually difference between the subproceses supposed to be, in terms or arguments, local_rank
that's all.It may be possible and that I'm not distributing the model between the GPUs in an optimal way or at least in a way that matches your framework.
If you have an example it would be great.
AgitatedDove14 It will take me probably a few days but I'll let you know.
AgitatedDove14 When the default is None I expect the default value to be None even if the type is str. But I'll use your recommendation 🙂
AgitatedDove14
These were the loggers names I can see locally running the code, it might differ running remotely.
['trains.utilities.pyhocon.config_parser', 'trains.utilities.pyhocon', 'trains.utilities', 'trains', 'trains.config', 'trains.storage', 'trains.metrics', 'trains.Repository Detection']
regarding repreduce it, have a long data processing after initializing the task and before setting the input model/output model.
I think if there's a default value it should override the type, otherwise go with the type
AgitatedDove14 ArgParser argument
yes, it was.
the version of the agent (the worker that received the job was 0.14.1)
the one that created the template was 0.14.2
the trains version is still 0.14 it will take time to switch it