
Reputation
Badges 1
58 × Eureka!I had to manually create a dump for the mongo data and import it into 4.4. I was just referring to adding a note to the documentation for other users.
I cannot execute step 4 because I can't get past step 3. Does that make sense?
There was some complication during the upgrade so I had to resort to the manual process.
I have now been able to upgrade by dumping the mongodb data and restoring it independently.
The docker container in step 3 does not run because of the incompatibility
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
That makes sense. Currently, I use python multiprocessing to launch multiple experiments on the sam GPU device. I'm guessing using trains-agent
will be similar
Ok, So Git credentials are present at two locations - 1) outside the agent
config and 2) inside it. I updated credentials at both locations and now I'm seeing agent.git_user = <username>
in the dump, but I still have the same issue.
` # Set GIT user/pass credentials
leave blank for GIT SSH credentials ...
Got it. I haven't tried setting up trains-agent
yet so I don't know much about the overhead of launching the agent. I'd imagine if it has to create the full environment (installing requirements, etc), the overhead might not be that low. But as I'm reading, it looks like I can use a docker image with the full env. Is my understanding correct?
SuccessfulKoala55
For security reasons I don't want to have my password written out in a file. I'm trying to use https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token (PAT) from Github but I get authentication error. Is there an issue using PAT?
Yes, I tried to run steps 1,2,3,4 in order but got stuck at 3
Yes, I am using Pool. Here is what I think is happening. clearml launches a subprocess which I assume is a daemonic process. That process in-turn launches a subprocess for training which causes the error I mentioned
I'm using docker to run the experiment. Could it be that the config in the docker container doesn't have the git credentials?
That makes sense. The configuration file is located at ~/trains.conf
which I believe is the default location.
No I can't see my username printed out in the dump
The second subprocess is by design. It becomes the primary process when clearml does not use multiprocessing. I hope I'm not confusing you further
fatal: could not read Username for '
': terminal prompts disabled error: Could not fetch origin
Why is trains-agent trying read from terminal prompt instead of trains.conf
?
SuccessfulKoala55 Yes, I am using the --docker flag.
You are right about the Keyring. Once I make sure credentials are stored in a secure way, it works as expected. Thanks :)
Steps 1 and 2 on this https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration/ say to backup opt/clearml/data/mongo
and uncompress into /opt/clearml/data/mongo_4
. Isn't it just copying the old data files?
2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
Yes, I'm not sure either. I have banged my head against the wall in trying to have multiple level of subprocesses, but it gets too complicated with python. Let me know what you find out
I was getting the error in step number 3
Hi AgitatedDove14 Thanks, I'll check these out.
What is the exact use case you have in mind?
I want to store some additional data that is not relevant to training a model. For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.
Btw, I had been busy with another project and hadn't logged in here for some time. I see that you guys have made a lot of progress in the last two months! I'm excited to di...