I have written a lot and have presented a lot on time series data management in past. Mostly, I focused on reduction of storage, cost, development efforts and improvement of performance. However, revisiting it with an angle of sustainability gives an interesting perspective.
Sustainability in IT is directly
related to how efficiently you manage data. Primarily, how efficiently you
process data, store data and transfer data over the network. Selection of right
database technology can influence first two directly.
In this blog, I would like to
talk about efficiently handling time series data which can help drastically
reduce carbon footprint by reducing storage requirement up to 70% and improving
processing by 30 times.
Often, timeseries database is mis-understood
as No-SQL database, but unfortunately very few people know that timeseries is a
specific technology and is all about how time series data can be stored and
processed efficiently.
So let us understand this time
series world and how selecting right database technology can help drastically
reduce carbon footprint.
Who generates time series data?
Traditionally, time series are used
in statistics, signal processing, pattern recognition, econometrics,
mathematical finance, weather forecasting, earthquake prediction,
electroencephalography, control engineering, astronomy, communication
engineering and largely in domain of applied science and engineering which
involves temporal measurements. With emergence of the Internet of Things (IoT)
and proliferation of connected devices, we are seeing more and more time series
data is generated via sensors. And hence, irrespective of the domain, time
series data is generated almost everywhere: Capital Markets, Energy and
Utility, Telecommunications, Manufacturing, Logistics, Scientific Research,
Intelligent Transportation and many more.
How does it look like?
The time series data has internal structure that differs
from relational data. Many applications require to store data at frequent
intervals that require massive storage capacity. For these reasons, it is not
sufficient to manage time series data using traditional relational approach of
storing one row for each time series entry. This increase storage and processing
requirements exponentially and increase the carbon footprint drastically.
Informix TimeSeries handles it efficiently:
IBM's Informix TimeSeries feature provides a solution to this
problem with breakthrough technology. The Informix TimeSeries feature is a
combination of a TimeSeries data type and a large set of in-built analytical
functions. How it manages time series data
can be understood from my decade old article published in DBTA Magazine - Managing
Time Series Data with Informix - Database Trends and Applications (dbta.com).
It can reduce the storage requirement by more than 50%, improve performance by orders of magnitude. With the integration of the TimeSeries feature with NoSQL/JSON and In-memory datawarehouse capabilities, it can handle heterogeneous and unstructured time series data, and run real-time analytics at speed of thought. The capability to store data up to hertz frequency further enhances its reach in different industries. And the rolling window feature eases out challenge of purging the humongous data periodically. Moreover, the way TimeSeries is structured, one can just keep inserting millions of records and still the performance will remain consistent without performing any database tuning activities, which drastically reduces processing requirements.
For more details and more solutions refer the red book I co-authored - Solving Business Problems with Informix TimeSeries (ibm.com)
Conclusion
Time series doesn’t necessarily mean a NoSQL data. It has its own structure and if right database technology like Informix TimeSeries is chosen to handle this data, one can drastically reduce the carbon footprint.