Reputation
Badges 1
981 × Eureka!I think it comes from the web UI of the version 1.2.0 of clearml-server, because I didn’t change anything else
how would it interact with the clearml-server api service? would it be completely transparent?
yes, in the code, i do:task._wait_for_repo_detection() REQS_TASK = ["torch==1.3.1", "pytorch-ignite @ git+ ", "."] task._update_requirements(REQS_TASK) task.execute_remotely(queue_name=args.queue, clone=False, exit_process=True)
Thanks a lot, I will play with that!
AgitatedDove14 Yes exactly! it is shown in the recording above
After some investigation, I think it could come from the way you catch error when checking the creds in trains.conf: When I passed the aws creds using env vars, another error poped up: https://github.com/boto/botocore/issues/2187 , linked to boto3
Would you like me to open an issue for that or will you fix it?
I’d like to move to a setup where I don’t need these tricks
So most likely trains was masking the original error, it might be worth investigating to help other users in the future
Does what you suggested here > https://github.com/allegroai/trains-agent/issues/18#issuecomment-634551232 also applies for containers used by the services queue?
I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf
I'll try to pass these values using the env vars
And after the update, the loss graph appears
without the envs, I had error: ValueError: Could not get access credentials for ' s3://my-bucket ' , check configuration file ~/trains.conf After using envs, I got error: ImportError: cannot import name 'IPV6_ADDRZ_RE' from 'urllib3.util.url'
what would be the name of these vars?
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
but I also make sure to write the trains.conf to the root directory in this bash script:echo " sdk.aws.s3.key = *** sdk.aws.s3.secret = *** " > ~/trains.conf ... python3 -m trains_agent --config-file "~/trains.conf" ...
Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?
So the problem comes when I domy_task.output_uri = " s3://my-bucket , trains in the background checks if it has access to this bucket and it is not able to find/ read the creds
And I can verify that ~/trains.conf exists in the su home folder
File "devops/valid.py", line 80, in valid(parse_args) File "devops/valid.py", line 41, in valid valid_task.output_uri = args.artifacts File "/data/.trains/venvs-builds/3.6/lib/python3.6/site-packages/trains/task.py", line 695, in output_uri ", check configuration file ~/trains.conf".format(value)) ValueError: Could not get access credentials for 's3://ml-artefacts' , check configuration file ~/trains.conf
region is empty, I never entered it and it worked
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4 and the ImportError disappeared 🙂
(btw, yes I adapted to use Task.init(...output_uri=)
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
So it looks like it tries to register a batch of 500 documents
I get the following error: