Cloud Storage as potential solution for temporary storage of sensor data.
The service in question gathers weather information from over 40.000 weather stations, which report data every second. The amount of data is overwhelming by means of online storage required on a set of given servers. Data needs to be stored temporarily before being fed into a processing pipeline, which calculates weather conditions and creates weather reports which then can be accessed through the services web interface. Weather live data is used for updating a database with current and future conditions and keeps a historical unprocessed record history with only the raw factors which at a later stage can be used to display historical data to online users.
Analysing the requirements
Base requirement for this venture to work properly would be the existence of a high speed interface which could handle up to 50.000 requests in parallel and manage to pipeline around 30GB of data per second. As well the final block storage would need to reliably store this data without loss. The accepted error quote would need to be <0.0001% in order to minimize retries for storing weather data packages containing per second sensor data of the weather station in question.
Beside storing data, it would need to be sequentially retrieved and being fed into a processing pipeline for data aggregation. This near to real time requirement would increase the amount of requested Input / Output (IO) requests near to double.
A secondary process would manage archival of information and in parallel keep data backups of the acquired data for up to 90 days. Data will then be permanently deleted as all required data is by then long term stored in a NoSQL database (Cassandra).
Summarizing the above, a minimum of 150.000 IOPS would be need to be provisioned to allow this amount of data to be stored and retrieved at will.
The above requirements just summarize the actual data we received by you-weather.com. It is a brief compilation of the most important requirements for the sake of compiling this article.
Potential solutions and brain storming
One solution for storing large amounts of data in a computer network are NAS or SAN storage arrays. While calculating the average storage needs we found that the required storage would be 1.8 TB worth of data per minute, 43TB of data per day or 1.29PB (Petabytes) for a 30 day storage. As mentioned above we need to store the data for at least 90 days, which brings us to a dazzling 3.87 PB of storage required. Taking this into account none of the above NAS or SAN storage arrays would work for this problem as the overwhelming need for around 4 PB of data being utilized would require a huge investment.
Asking ourselves the question of which would be the best solution for the problem the whole team of storage experts involved, agreed that cloud storage was the only solution, and probably the single best choice we had to fulfil the clients’ requirements.
A thing we knew was that we needed to utilize a contained online storage environment which would be able to process and store data in our own block storage locations, but not enough, we needed an API for being able to interface with the current solutions our client had already in place.
As we saw that cloud storage was our only option for being able to handle large amounts of data with a near to unlimited IOPS requirement we went on thinking about the set up. 100-120k of parallel operations per second. Each single of our own servers can handle around 20.000 parallel IO operations, which led us to the result that we needed 10 servers behind a capable load balancer to manage the incoming and outgoing traffic, but, again we also needed sufficient bandwidth. 1GBit Network cards were not enough, so ultimately we decided to go with optical fibres to interconnect Load Balancers and internal servers, and would dedicate a 500GBit private network which connects our Datacentres through the Level3 global network.
The final solution
As outlined above we went for an elegant solution that would provide a future proof and scalable solution to our client for storing weather data. In the cause of this project we created an API for mass storage of data based on a minimalistic REST interface which allowed PUT and GET as well as DELETE requests being issued against a set of 10 private cloud servers behind a load balancer, run in a Datacentre chosen by our client near his corporate facility, but with the storage servers geographically distributed. This solution solved the accessibility of data, while keeping the setup hybrid, allowed keeping the overall cost for running the service minimal in comparison to having to run a block storage cluster on premise. The flexibility of XXL Cloud for Business to run practically on any equipment and on any kind of servers has proven to be worth the effort to develop. The project for helping you-weather.com to solve their online storage problem has also shown us the need for a reliable heavy duty API which can serve hundreds of thousands IO requests per second in parallel.
You-Weather.com is an inventive weather forecast service that strives to visualize weather. The relevant weather intelligence is gathered from 40.000 private weather stations which deliver sensor updates every second. Through a proprietary algorithm the weather data is then processed and aggregated to deliver human readable short – and long term forecasts. You-weather.com is currently in a public beta and open for testing by the public (available only for desktop browsers). We encourage XXL Cloud users to test it and see the performance which can be achieved live through the usage of high performance cloud storage.
XXL Cloud in an effort to solve storage problems for enterprise customers strives to provide the very best in cloud and online storage. We are dedicated to develop a fully functional API which we believe will be released to home and enterprise users by the end of June 2016. If your company is currently looking to solve your secure data storage problems, contact us for a free consultation.