Yes it should word with ClearML if it works with requests
Hi @<1523701842515595264:profile|PleasantOwl46> ! This looks like a python problem. A useful SO thread: None
First, I would verify that I can access the api server without using the SDK. To do so, run this code after filling the credentials yourself (just login should be enough to verify that the api server is reachable)
api_server = ""
access_key = ""
secret_key = ""
token_req = requests.get(api_server + "/auth.login", auth=(access_key, secret_key))
token = token_req.json()["data"]["token"]
If it doesn't work, it means that requests
can't validate the certificates properly.
If that is the case, then I would try to install certifi
both via pip install -U certifi
and apt install -y python3-certifi
(commands might vary) and the try to do make the get
with the argument verifi=certifi.where()
. Then setting REQUESTS_CA_BUNDLE
env var to the right certifi path should allow the requests to go through.
If this also doesn't work, then I would look for the right certificates, because some comodo certificates for example may not be downloaded via certifi: (for example: None ).
These are just a few suggestions, not sure if they will help
have you tried copying the certificate to /usr/local/share/ca-certificates/
?
I'm using rpm based machine, but I get your direction
put the cert in the right place for python to looks for it automatically
can I assume if it works smoothly with requests
or urllib3
it will work for the ClearML API?
in case this will help someone else, I did not had root access to the training machine to add the cert to store
you can point your python to your own CA using:
export CURL_CA_BUNDLE=/path/to/CA.pem
SDK version: 1.14.4
clearml-server version: Server: 1.14.0-431 • API: 2.28
so I think I'm in the right direction
adding verify=
and pointing to my CA.pem looks like the right approach
now, how do I use it with ClearML API?
cleanup_service
for task in tasks:
try:
deleted_task = Task.get_task(task_id=task.id)
print (deleted_task.name)
deleted_task.delete(
delete_artifacts_and_models=True,
skip_models_used_by_other_tasks=True,
raise_on_error=False
)
it throw down the SSL error, for each that have stuff in fileserver that it's trying to delete
when uploading anything to fileserver
task = Task.init(project_name=project_name, task_name=exp_name, continue_last_task=True)
logger = task.get_logger()
img = Image.open(f"./1_model.png").convert("RGB")
logger.report_image(title=f"cfg_0", series="Model", iteration=1, image=img)
@<1523701087100473344:profile|SuccessfulKoala55> looks OK (?)
>>> StorageHelper.get(Task._get_default_session().get_files_server_host())._container.session.verify
InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See:
True
@<1523701435869433856:profile|SmugDolphin23> thanks for good pointers
it did not work on first attempt - requests
did not validated the certs right
I have added this:
token_req = requests.get(api_server + "/auth.login", verify="<my_org_CA>", auth=(access_key, secret_key))```
print(token_req)
I got back
<Response [200]>
which I believe is good right?
when addingtoken = token_req.json()["data"]["token"]
I got errors from json decoder, which I believe is expected?
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
let me dig in more and hopefully can share successful results
thanks!
@<1523701435869433856:profile|SmugDolphin23> working! here is what I have on Fedora/RHEL
- copy certs to
/etc/pki/ca-trust/source/anchors/
update-ca-trust
Hi @<1523701842515595264:profile|PleasantOwl46> , can you check what this results in?
from clearml.storage.helper import StorageHelper
StorageHelper.get(Task._get_default_session().get_files_server_host())._container.session.verify
looks like I can't interact with fileserver
?
the sudo update-ca-certificates
? maybe this will work
I have tried some small task only uploads single file
logger = task.get_logger()
img = Image.open(f"./1_model.png").convert("RGB")
logger.report_image(title=f"cfg_0", series="Model", iteration=1, image=img)
ended with:
Retrying (Retry(total=0, connect=5, read=5, redirect=5, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))': /
2024-03-03 09:53:54,987 - clearml.metrics - WARNING - Failed uploading to
(HTTPSConnectionPool(host='clearml-hrl-03.haifa.ibm.com', port=8081): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))))
2024-03-03 09:53:54,988 - clearml.metrics - ERROR - Not uploading 1/1 events because the data upload failed