Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Answered

Hi, i have been getting the following for a while. Is there a more detailed log i can look into? This happens on both https and http.
2021-05-27 08:47:02,539 - clearml - WARNING - InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See: 2021-05-27 08:47:02,541 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_d3n2ft39.tmp => Model Architecture.40d0715371a4456893832a8c8603965d/models/model_final.pth 2021-05-27 08:47:03,219 - clearml.storage - ERROR - Exception encountered while uploading ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')) 2021-05-27 08:47:03,219 - clearml.Task - INFO - Failed model upload

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Votes Newest

Answers 30

Hi SuccessfulKoala55 , just to add, my clearml.conf (client) and clearml.agent.conf (agent) can have differing values. I'm not sure which one takes precedence and if this could be the cause.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Well, by default, the SDK and agent both use the clearml.conf file. If those files reside on different machines, there should be no confusion

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SubstantialElk6 your agent configuration files does not have any AWS credentials defined - is that on purpose?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yes its on purpose, each user would have their own AWS credentials for default_output_uri.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

But the agent doesn't have any...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

How can it run the task at all?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

My assumption is that the agent will have pulled that off the client's clearml.conf.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

It won't, as it's not on the same machine... and credentials are never sent to the server, so basically the experiment running in the agent will have no credentials

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I assume that's the issue?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

i see. Can i take it that when the client uses
task.execute_remotely(queue_name="1gpu", exit_process=True)then none of the content in its clearml.conf will be used, except for the API part. And Clearml simply uses whatever is on the Agent side.
api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server: web_server: files_server: # Credentials are generated using the webapp, # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY credentials {"access_key": "12345", "secret_key": "67890"}, verify_certificate: false }

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

execute_remotely simply creates a task and enqueues it - nothing of the original clearml.conf file is used in the new task - the clearml.conf file is always local to the executing machine.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.

How do we configure clearml in correspondence to following boto code?

S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', verify='/mycerts/ca.crt')

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Hi SubstantialElk6
If you are using boto to acess anything that is Not AWS S3 you have to add both address and port, and make sure you configure the "security" flag.
See example in clearml.conf :
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L247
` aws {
s3 {

            {
                host: "my-minio-host:9000"
                key: "12345678"
                secret: "12345678"
                multipart: false
                secure: false
            }
        ]
    } `And in the  ` output_uri `  you should have something like:  s3://my_host:9000/bucket/folder

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ok, let me check this out first thing on Monday, thanks AgitatedDove14 .

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Hi.

We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname

This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.

It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

SubstantialElk6
Notice if you are using a manual setup the default is "secure: false" you have to change it to "secure: true":
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L251

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Since you actually need https, you should use the "secure" property in the bucket configuration in clearml.conf

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Thanks. We set this configuration and the client ran and submitted the job for remote execution (agent running k8s glue). However when the job runs, and tries to save into model repo, this error came up.
ClearML.storage - ERROR - Failed creating storage object S3://ecs.ai Reason; Missing key and secret for S3 storage access ( S3://ECS.ai ).

I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem. I also tried to set the same credentials in 'web app cloud access' under profile. That didn't help as well.

The implication is that
All models can only be saved to a single model repo defined in the ClearML agent end. Users have no choice of defining their own repo destination of choice.
Is there any way out of this? I find it strange that users can't choose a model repo destination of their choice and had to rely on just one public bucket.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem.

SubstantialElk6 the configuration should be set on the agent's machine (i.e. clearml.conf that is on the machine running the agent)

Users have no choice of defining their own repo destination of choice.

In the UI you can specify in the "Execution" tab, Output "destination", a different destination for the models/artifacts. Is this what you are looking for ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.

Referencing your suggestion, we can configure output_uri on task.set_base_docker() but how should we do this for the credentials?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.

Correct, I think "vault" option is only available on the paid tier 😞

but how should we do this for the credentials?

I'm not sure how to pass them, wouldn't it make sense to give the agent an all accessing credentials ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Do you have more info on vault?
Actually it only make sense if the entire department or organisation are saving their models in a common repo. In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future). So each department and even projects will need their own repo.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).

Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Hmm that makes sense, I "think" the enterprise offering has a solution for that as well (i.e. full separation over static cluster), but probably the best way to constituent this avenue is talk to Sales (I'm assuming they'll setup a call to discuss the details)

Going back to the open source, I think that adding the credentials as part of the source code might allow to have "credentials" auto populate as part of the remote execution, wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Going back to the open source, I think that adding the credentials as part of the source code might allow to have "credentials" auto populate as part of the remote execution, wdyt?

Not sure how this will work when i can't supply the credentials to ClearML programatically.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

I thought of another potential way but not sure if the SDK supports it.
We will perform manual save and upload of model using vanilla boto3 and credentials passed in as env var. Use ClearML SDK to update the Model Repo on the location of the model, without ClearML uploading it explicitly.Would the above work?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Hi SubstantialElk6 , you can do it by:
Uploading the model file yourself Creating an OutputModel object, associated with your task (it will automatically be associated with the main task unless otherwise specified) Calling the model's update_weights() method with the register_uri argument pointing to the URI of the previously uploaded model fileWDYT?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi Jake, thanks for the suggestion, let me try it out.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SubstantialElk6
				
					0
					 × 1

Write your answer

1K Views

30 Answers

3 years ago

2 years ago