Hello Guys, I Read About Trains Some Days Ago And Think It Is Exectly What I Was Looking For, So I Ran The Docker Image And Started Thinking Of What I Would Like To Do And The Processing Steps I Would Like To Automize Which I Currently Run Manually Trigge

Answered

Hello guys,
I read about Trains some days ago and think it is exectly what I was looking for, so I ran the docker image and started thinking of what I would like to do and the processing steps I would like to automize which I currently run manually triggered ...
I think I would like to start directly with a kubernetes serverless environment but unfortunately I never used it before. I read months ago a lot about it, but my focus moved .... Docker I have been in contact and created also own images, but just a 'normal' way without compose and or swarm.
I tried to visualize the steps I have (partly planned to have) so far on my privat project. The result is shown in the attached picture :male-artist:
The read circles could be added in separate Docker images, how to have the transition from one step (running in a read box) to another is so far quite unclear to me, but I expect Kubernetes / Ranger will have standard solutions on event queues and distributing their events to participating workers.
Basically the idea is to have a distributed file system, sychronizing the raw data of my data provider ( http://eoddata.com ) via wget (ftp connection) which is also on git controll and pushed to http://Gitlab.com (private project). The sync from the provider of all data takes too long so I decided to involve Gitlab for the case that I will need to get a copy of them (in case my system dies/upgrades/...). Also the data provider is providing historical data for a defined time span and I would like to keep the data disapering from the view on the server.
Once done, I use a couple of scripts to pre-process the data (generate SQLite DBs and parquet files for fast loading in Python), NA handling, normalization, whatever is needed to make the ML algorithms lucky.
Finally I use the data to generate models and compare them with each other. This is where I was searching for a tool and found Trains.
At the end, the process flow should make use of stored models and provide a report.

The reason why I'm writing you all of that is, that I need someone to ask stupid questions wich will come up when starting working with Kubernetes/Rancher. As the first question would be, what distributed FS to use and how, how to signal the step e.g. synchronization of data has been finished... tioll I finaly come to the state to make use of the fancy features of Trains and it's agents

Do you think this channel is the right for it?

  				
Posted 
	4 years ago

					More  		
  Report
		
					WickedGoat98
				
					0
					 × 1

Votes Newest

Answers 5

Cool
I'm already impressed about what Trains does with just 2 lines of code

  				
Posted 
	4 years ago

					More  		
  Report
		
					WickedGoat98
				
					0
					 × 1

The data I'm syncing by an data provider wich supports only an ftp connection....

Right ... that makes sense :)

No worries WickedGoat98 , feel free to post questions when they arise. BTW: we are now improving the k8s glue, so by the time you get there the integration will be even easier 🙂

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi Martin,

thanks for the reply.
The data I'm syncing by an data provider wich supports only an ftp connection....
I started a Ranche training, will need some time to be able to set-up my cluster before I can start using Trains ;)

  				
Posted 
	4 years ago

					More  		
  Report
		
					WickedGoat98
				
					0
					 × 1

the picture seemed to be missing.
sorry I tried but can't upload the picture to here. So I add a link to it https://drive.google.com/file/d/1HYYKDOY09hnE-DeCTPdZXpKy7537g5Ka/view?usp=sharing

  				
Posted 
	4 years ago

					More  		
  Report
		
					WickedGoat98
				
					0
					 × 1

Hi WickedGoat98
This sounds like a great design (obviously you have scale in mind 😉 ) Feel free to ask "stupid" questions, based on what you already wrote I doubt they will be
A few questions that come to mind (probably a few others after):
You mentioned FS synchronization, from where? i.e. what is the single source of truth ? K8s (Rancher 2.0 is basically k8s manager) can take care of mounting volumes, so no need to sync, is this a valid solution ?

BTW : (you can drag and drop an image here, slack will host and embed it )

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

5 Answers

4 years ago

2 years ago