I'm so glad you mentioned the cron job, it would have taken us hours to figure
but I also make sure to write the trains.conf to the root directory in this bash script:echo " sdk.aws.s3.key = *** sdk.aws.s3.secret = *** " > ~/trains.conf ... python3 -m trains_agent --config-file "~/trains.conf" ...
And I can verify that ~/trains.conf exists in the su home folder
I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf
I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf
That sounds like good practice
Other than the wrong, trains.conf, I can't think of anything else... Well maybe if you have AWS environment variables with credentials ? they will override the conf file
AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_DEFAULT_REGION
JitteryCoyote63 when the agent is running a job, it prints its configuration at the beginning, do you see the correct credentials there (you will not see the secret but you will see the access key)
AgitatedDove14 That's a good point: The experiment failing with this error does show the correct aws key:... sdk.aws.s3.key = ***** sdk.aws.s3.region = ...
region is empty, I never entered it and it worked
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
File "devops/valid.py", line 80, in valid(parse_args) File "devops/valid.py", line 41, in valid valid_task.output_uri = args.artifacts File "/data/.trains/venvs-builds/3.6/lib/python3.6/site-packages/trains/task.py", line 695, in output_uri ", check configuration file ~/trains.conf".format(value)) ValueError: Could not get access credentials for 's3://ml-artefacts' , check configuration file ~/trains.conf
So the problem comes when I domy_task.output_uri = "
s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds
AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf
I'll try to pass these values using the env vars
JitteryCoyote63 are you calling to:my_task.output_uri = "
s3://my-bucket
in the code itself ?
Why not with Task.init output_uri=...
Also this is running remotely there is no need fo r that, use the Execution -> Output -> Destination and put it there, it will do everything for you 🙂
After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3
(btw, yes I adapted to use Task.init(...output_uri=)
JitteryCoyote63 what am I missing?
What are the errors you are getting (with / without the envs)
without the envs, I had error: ValueError: Could not get access credentials for '
s3://my-bucket ' , check configuration file ~/trains.conf
After using envs, I got error: ImportError: cannot import name 'IPV6_ADDRZ_RE' from 'urllib3.util.url'
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
JitteryCoyote63 see if upgrading the packages as they suggest somehow fixes it.
I have the feeling this is the same problem (the first error might be trains masking the original error)
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4
and the ImportError disappeared 🙂
So most likely trains was masking the original error, it might be worth investigating to help other users in the future
Yes, hopefully they have a different exception type so we could differentiate ... :) I'll check
Import Error sounds so out of place it should not be a problem :)