Question About The File Server. Currently, We Have A Machine With Minio Installed, And All File Communication Is Made Using The Minio Sdk Client. [Minio Is Just Like An S3 Bucket, Fully Compliant With S3 Protocol]. In The Examples I'Ve Seen The

Answered

Question about the file server. Currently, we have a machine with MINIO installed, and all file communication is made using the MINIO SDK client. [MINIO is just like an S3 bucket, fully compliant with S3 protocol].

In the examples I've seen the StorageManager which we never used and I'm considering instead of using MINIO directly, to use the storage manager.

The thing is I don't understand the following:
What exactly is the file server container for and what does it do by default? Can I configure the storage to be a MINIO instance? Can I handle all my file I/O with the StorageManager ? If so how can I connect it all together (MINIO + trains)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Votes Newest

Answers 24

Thia is just keeping getting better and better.... 🤩

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

So just to be clear - the file server has nothing to do with the storage?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Okay Jake, so that basically means I don't have to touch any server configuration regarding the file-server on the trains server. It will simply get ignored and all I/O initiated by clients with the right configuration will cover for that?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

And once this is done, what is the file server IP good for? will it redirect to the bucket?

No, it'll just be there 🙂 You can obviously edit your docker-compose.yml and remove it, if you'd like (although it takes close to no resources)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I just tried setting the conf in the section Martin said, it works perfectly

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

To be clearer - how to I refrain from using the built in file-server altogether - and use MINIO for any storage need?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

So just to be clear - the file server has nothing to do with the storage?

Think of it as a quick and dirty "minio", storing files and serving them over http. If you have minio (or any object storage) you can replace it all together 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

You might need to turn off the secure option... Let me check

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

To store all the debug samples, also it can store all the models (if you configure the output_uri=' http://file_server_here:8081 ') Yes: instead of the file server have 's3://<ip_of_minio>:9000/bucket' make sure you add the credentials for the minio in the trains.conf Yes, basically once you have the creendtials in the trains.conf, you could do StorageManager.get_local_copy('s3://<minio>:9000/bucket/file') (also upload of course 🙂 )

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

WackyRabbit7 exactly 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

In your trains.conf, change the value
files_server: ' s3://ip :port/bucket'

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So how can I replace it with the MINIO?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Isn't this a client configuration

No, that's just the thing - in order to use minio, each client needs to have the credentials configured

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

WackyRabbit7 this section is what you need, un mark it, and fill it in
https://github.com/allegroai/trains/blob/c9fac89bcd87550b7eb40e6be64bd19d4384b515/docs/trains.conf#L88

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

EnviousStarfish54 Notice that you can configure it on the agent machine only, so in development you are not "wasting" storage when uploading debug checkpoints/models 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Wow! Just need this, I am surprised that I don't need to configure on server side

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					EnviousStarfish54
				
					0
					 × 1

I tried what you said in the previous response, setting sdk.aws.s3.key and sdk.aws.s3.secret to the ones in my MINIO. Yet when I try to download an object, i get the following
>>> result = manager.get_local_copy(remote_url="s3://***********:9000/test-bucket/test.txt") 2020-10-15 13:24:45,023 - trains.storage - ERROR - Could not download s3://***********:9000/test-bucket/test.txt , err: SSL validation failed for https://***********:9000/test-bucket/test.txt [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Hi WackyRabbit7 ,
Just to expand on #2:

make sure you add the credentials for the minio in the trains.conf

In trains.conf , set your minio credentials(key, secret, region) in sdk.aws.s3.key , sdk.aws.s3.secret etc. You can also use the standard AWS env vars which are automatically parsed by Trains ( AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION )

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Okay martin, I'll try that

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Martin: In your trains.conf, change the value
files_server: ' s3://ip :port/bucket'

Isn't this a client configuration ( trains-init )? Shouldn't be any change to the server configuration ( /opt/trains/config... )?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

basically the default_output_uri will cause all models to be uploaded to this server (with specific subfolder per project/task)
You can have the same value there as the files_server.
The files_server is where you have all your artifacts / debug samples

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I know I can configure the file server on trains-init - but that only touches the client side, what about the container on the trains server?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

And once this is done, what is the file server IP good for? will it redirect to the bucket?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Continuing on this discussion... What is the relationship between configuring files_server and all the rest we just talked about and the the default_output_uri ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					WackyRabbit7
				
					0
					 × 1

Write your answer

2K Views

24 Answers

5 years ago

2 years ago