Reputation
Badges 1
58 × Eureka!Hi AppetizingMouse58
Yes, I tried to perform steps 3-10, however step 3 raised an error because data files for mongo were incompatible between 3.6 and >4.0
mongo 4.4 image does not launch a container if the data in mongo dir is for previous versions. We should add that comment in the documentation
I had to manually create a dump for the mongo data and import it into 4.4. I was just referring to adding a note to the documentation for other users.
Steps 1 and 2 on this https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration/ say to backup opt/clearml/data/mongo
and uncompress into /opt/clearml/data/mongo_4
. Isn't it just copying the old data files?
SuccessfulKoala55
For security reasons I don't want to have my password written out in a file. I'm trying to use https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token (PAT) from Github but I get authentication error. Is there an issue using PAT?
Also it might be better (although not necessary) to have a separate collection for storing inference results for better organization.
2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
Yes, I'm not sure either. I have banged my head against the wall in trying to have multiple level of subprocesses, but it gets too complicated with python. Let me know what you find out
What would be the query ? Are you reporting 100+ diff scalars ?
At the moment I am not reporting any scalars related to inference. I'm only reporting data related to training a model. But I would like to report records that result from an inference process. For example the record would contain key_1, key_2, datetime, pred_1, pred_2 ... pred_n. I would have about 20 scalars if each of these fields is reported as a scalar.
The query can be a simple filtering criteria matching some keys ...
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
That makes sense. Currently, I use python multiprocessing to launch multiple experiments on the sam GPU device. I'm guessing using trains-agent
will be similar
Ok, So Git credentials are present at two locations - 1) outside the agent
config and 2) inside it. I updated credentials at both locations and now I'm seeing agent.git_user = <username>
in the dump, but I still have the same issue.
` # Set GIT user/pass credentials
leave blank for GIT SSH credentials ...
Hi AgitatedDove14 , I'll wait for you to reply on Github before I add my comments to these points.
Hmm, ok. Yes that would make it easier.
From architectural point of view - say I know I'll be running the experiment on a trains-agent
, when I initialize and execute the experiment locally, how hard would it be to instead send all the execution details and env to the trains agent and run it directly there? Can the configuration be packaged when we initialize the Task? Does the question make sense?
Got it. I haven't tried setting up trains-agent
yet so I don't know much about the overhead of launching the agent. I'd imagine if it has to create the full environment (installing requirements, etc), the overhead might not be that low. But as I'm reading, it looks like I can use a docker image with the full env. Is my understanding correct?
I'm using docker to run the experiment. Could it be that the config in the docker container doesn't have the git credentials?
Yes, I tried to run steps 1,2,3,4 in order but got stuck at 3