![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/SarcasticSparrow10.png)
Reputation
Badges 1
58 × Eureka!Yes, I am using Pool. Here is what I think is happening. clearml launches a subprocess which I assume is a daemonic process. That process in-turn launches a subprocess for training which causes the error I mentioned
2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
Yes, I'm not sure either. I have banged my head against the wall in trying to have multiple level of subprocesses, but it gets too complicated with python. Let me know what you find out
Ok, So Git credentials are present at two locations - 1) outside the agent
config and 2) inside it. I updated credentials at both locations and now I'm seeing agent.git_user = <username>
in the dump, but I still have the same issue.
` # Set GIT user/pass credentials
leave blank for GIT SSH credentials ...
That makes sense. The configuration file is located at ~/trains.conf
which I believe is the default location.
No I can't see my username printed out in the dump
I'm using docker to run the experiment. Could it be that the config in the docker container doesn't have the git credentials?
SuccessfulKoala55 Yes, I am using the --docker flag.
You are right about the Keyring. Once I make sure credentials are stored in a secure way, it works as expected. Thanks :)
What would be the query ? Are you reporting 100+ diff scalars ?
At the moment I am not reporting any scalars related to inference. I'm only reporting data related to training a model. But I would like to report records that result from an inference process. For example the record would contain key_1, key_2, datetime, pred_1, pred_2 ... pred_n. I would have about 20 scalars if each of these fields is reported as a scalar.
The query can be a simple filtering criteria matching some keys ...
I was getting the error in step number 3
Hi AppetizingMouse58
Yes, I tried to perform steps 3-10, however step 3 raised an error because data files for mongo were incompatible between 3.6 and >4.0
Steps 1 and 2 basically copy mongo 3.6 data into a new dir mongo_4
but mongo image of version 4.4 does not accept that data. So I had to perform the following steps:
Launch docker container with mongo=3.6 dump data using mongo dump Launch docker container with mongo=4.4 and empty mongo_4
data dir Restore the dump data using mongo restore
This made sure the data is now compatible with mongo 4.0 or greater
I cannot execute step 4 because I can't get past step 3. Does that make sense?
The docker container in step 3 does not run because of the incompatibility
Yes, I tried to run steps 1,2,3,4 in order but got stuck at 3
I'm getting the same error when I followed the instructions to the letter.
Here is one line from the mongo docker output"This version of MongoDB is too recent to start up on the existing data files. Try MongoDB 4.2 or earlier."
Hi AgitatedDove14 , I'll wait for you to reply on Github before I add my comments to these points.
Yes the 'training' is my main code. You can think of it has launching a job (training or inference). My main code launches multiple jobs using multiprocessing. Each job is a seprate task for clearml that gets logged. Does that make sense?
I posted the https://stackoverflow.com/questions/64636294/trains-reusing-previous-task-id/64636297#64636297 on stackoverflow with the answer :)
The second subprocess is by design. It becomes the primary process when clearml does not use multiprocessing. I hope I'm not confusing you further