Research Perspectives for Time Series Management Systems Werner Dreyer, Angelika Kotz Dittrich, Duri SchmidtUBILAB, Union Bank of Switzerland, Zurich{dreyer, dittrich, schmidt}@ubilab.ubs.chAbstract
• As current TSMS are difficult to use, economists
Empirical research based on time series is a data intensive
cannot work without the help of a database specia -
activity that needs a data base management system
(DBMS). We investigate the special properties a time se -
• Synchronization problems may arise when time se -
ries management system (TSMS) should have. We then
ries are used concurrently by different users.
show that currently available solutions and related rese -
Research in time series management has to address these
arch directions are not well suited to handle the existing
problems and to provide adequate solutions.
problems. Therefore, we propose the development of a
The remainder of the paper is organized as follows: First
special purpose TSMS, which will offer particular mode -
of all, we address the properties of TSMS. Then, we show
ling, retrieval, and computation capabilities. It will be
an example of a time series base. We look at current solu -
suitable for end users, offer direct manipulation interfa -
tions and at research related to our problem domain. Fi -
ces, and allow data exchange with a variety of data sour -
nally, we discuss the goals and the research issues of our
ces, including other databases and application packages.
We intend to build such a system on top of an off-the -shelf object-oriented DBMS. Requirements for TSMS Introduction Data Model Requirements
At Union Bank of Switzerland, several departments workintensively with time series. They encounter a number of
Structural Elements
A data model for time series contains the following
• The large volume of time series (several thousand)
structural elements: Events, time series, groups of time
makes their management a difficult task.
series, other data, and time series bases.
• In large time series bases, researchers have problems
finding the time series relevant to their work.
Events are the basic building blocks of time series. An
• When a time series base becomes too large, data qua -
event consists of the event data, which is time-variant.
lity management is impossible without the help of a
Examples of event data are the opening, low, high, and
closing prices of a share. The event data can have an
• Researchers usually do not work with all the time
atomic or a structured data type. Atomic data types are
series an institution collects. Instead, there is a need
scalar types like integer or string. Records or arrays of
to build and maintain project-specific time series
atomic types are examples for structured types. Time Series
• Such a time series base must contain data from pu -
blic and company databases as well as project-pro -
A time series consists of a header and a sequence of
events ordered chronologically. Header data are sharedby all events. They may be time-invariant and describe
• The same time series are used with many different
common properties of the time series as a whole (e.g. the
tools, e.g. statistics packages, spreadsheets, and
location of the stock exchange), or they may be time-va -
desktop publishing programs. Without the help of a
riant and derived from the events (e.g. the average closing
DBMS that can cope with the different data formats
these tools use, researchers are forced to duplicatethe data.
Normally, all the events of a time series are of the sametype, but they can also vary over time.
• Researchers need specific system support for tasks
like transforming the periodicity of time series, filte -ring, or computing new time series.
Data values are either base or derived values. Base values
pany. A TSMS must support a flexible, powerful grou -
record measured facts. Derived values are computed di -
rectly or indirectly from base values.
A group consists of its header and its member set. The
Data values differ in what they measure. One example are
header contains data shared by all members, e.g. the
stock values, i.e. values which measure a value at some
group name, or data derived from some members, such as
point in time, e.g. a price of a share. Stock values can be
the covariance matrix of some time series contained in
differentiated further. They may measure the value at the
that group. The member set contains the time series and
beginning or end of a period, or the highest, lowest or
subgroups belonging to the group. It is necessary that a
average value of a period. Another example are flow va -
group can also contain other groups, that it is possible to
lues. They measure a value over a period of time, like a
define the members of a group by enumeration or by
cash flow. These different kinds of values have different
computation, and that members can belong to more than
periodicity transformations. An example is the transfor -
mation from a monthly to a quarterly periodicity: For the
Other Data
high selling price, one has to choose the maximum of thethree monthly values, for the closing price the last value
Applications which use time series may also have to ma -
and for the cash flow the sum of the three values.
nage other persistent data that are not related to time se -ries or groups. Examples are results of simulations or
A calendar is associated with each time series. The calen -
constants. These data are often not time series themsel -
dar does the mapping between events and the time when
ves. There are also meta data, i.e. data describing the ac -
the event occurred. In the case of time series with con -
tual events, time series, and groups. Unfortunately, there
stant time between events, the calendar also defines the
is no clear definition of what meta data are. Often, one
user's meta data are the other user's data, and vice versa.
Time series differ in their density, i.e. in how many
It is not clear to what extent a TSMS should be able to
events are recorded. Missing events may arise from di -
cope with these data or how a TSMS should be combined
verse reasons, such as that the event did not happen or the
A further property is the ordering of the events over time. Time Series Bases
The events of a time series may be completely ordered,
All the time series, groups and other data managed as one
i.e. there is at most one event per point in time. In
logical unit make up a time series base. Its size depends
contrast, the events of other time series may be only par -
on the number of time series, their periodicity, the obser -
tially ordered, i.e. more than one event with the same
vation interval, and the size of the events.
time stamp may exist in the time series. An example forsuch a situation is the use of estimation methods for fu -
Functional Requirements
ture values. A researcher may record several events per
Distribution of Functional Capabilities
point in time, because he or she makes repeated estimati -ons for certain data values after having received more ac -
As mentioned before, researchers use their time series
with several programs, such as statistics packages,spreadsheets, and charting programs. These specialized
The cardinality of a time series, i.e. the number of events
programs provide a thorough functionality in their appli -
of a time series, varies with the application area. In the
cation area. Therefore, it makes no sense for the TSMS to
case of economic time series, the cardinality ranges from
replicate all the functionality these programs already of -
several tens in the case of annual data to several thou -
fer. It is more sensible to build an environment where the
TSMS works as a repository for the time series, and the
On the one hand, time series data are accessed along
tools use this repository to provide their special services.
time, i.e. event sequences of one time series are exami -
The emphasis of the TSMS should, therefore, be on data
ned. On the other hand, researchers also use cross-sectio -
management capabilities and on basic transformation ca -
nal analysis, whereby several time series are explored at a
pabilities to prepare the data for further analysis.
certain point in time. The data model must allow efficientstorage and access methods for both ways. Operations on Events
A TSMS should support the usual operations on atomic
types, and must provide read and update access to the
Detecting time series in large time series bases relevant to
components of structured event types.
the interests of a user is an important issue. One way tofacilitate this is to partition the universe of all time series
Operations on Time Series
of a time series base into several groups according to va -
The most important functional capabilities of a TSMS are
rious criteria, e.g. branches, country, or size of the com -
the definition of new time series, the storing of data in
time series, and the retrieval of data from time series.
series base for their individual research, but they also
Another important capability of a TSMS is the derivation
need access to other databases, e.g. time series bases of
of new time series from existing ones. Examples are the
other members within the research group, or company-
computation of the difference of two time series, the ap -
wide financial databases. Most of these databases do not
plication of a moving average or other filters to a time se -
use the same data model or the same schema as the pri -
ries, and the transformation of the periodicity of a time
vate time series base. They may reside on the same com -
puter, on the same local area network or on a remote
A TSMS should provide decent query capabilities. It
should at least support a select operation for time series
Researchers frequently want to copy a subset of such
similar to the select operation of the relational algebra. It
other databases into their private time series base. Such a
would also be useful to be able to apply not only logical
copy operation may be triggered manually, by some time
operations in the selection condition but also other opera -
event or by some other event of interest, e.g. a price
tions, such as arithmetic and time-related operations. An
crossing a certain threshold. Therefore, specifying va -
optimizer should support users in query processing to
rious update policies and events in a TSMS must be easy.
In addition to databases, there are various other data
Operations on Groups
sources. Data may be gathered manually. At UBS, for ex -ample, the Gross National Product of several countries is
Additional operations are required to define groups, to
estimated and entered manually. Other data are received
store and to retrieve group data, to add and remove mem -
as files from external data providers like OECD. Fur -
bers, and to enumerate them. The manipulation of the
thermore, data is also received as a continuous data
member set should be facilitated by set operations. One
stream from other computers. Stock prices, for example,
should also be able to apply operations to all members of
can be received as a real time data feed from ticker servi -
a group without explicitly iterating over them. Calendar
Data sources differ in the amount of data and in the fre -
A TSMS must support calendar-related operations and
quency of updates. Manual entry generates only a small
date arithmetic, e.g. computing the difference between
amount of new data and the frequency of updates is low.
two dates, even if the two time series have different peri -
In contrast, a real time data feed often generates a large
odicities. Different time series may be based on different
amount of data. However, for research purposes, it is
calendars, e.g. a business calendar with five business days
normally sufficient if the TSMS can cope with fewer up -
per week or a calendar that can cope with local holidays. Browsing and Editing
Usually, the format of data from various sources also dif -fers. A TSMS must be able to handle all these formats.
Two other important functions are browsing and editing. Browsing means interactively exploring the time series
Normally, the available file formats for data exchange are
available in a time series base and investigating their
not satisfying, because they do not define how to format
contents. Such a browsing tool also helps to get an over -
header data like periodicity, etc. It might be necessary to
view of the group structures. Editing means interactively
develop a specific format to make lossless data exchange
building new or updating existing time series.
A browser-editor also needs some presentation facilities
Export of Data to Client Applications
capable of displaying time series in table or chart formats. However, these presentation facilities are rather basic.
When we export data to programs like statistical packa -
For a more sophisticated presentation, one can use specia -
ges or spreadsheets, we are faced with similar problems
So far, we have described requirements of a TSMS data
Means of Data Exchange
model. One of the goals of the project will be defining adata model and deciding which of the structures and func -
The different sources for data import and sinks for data
tional capabilities mentioned should be incorporated.
export provide different ways of data exchange. Interfa -ces between components can be realized, for example, via
Data Exchange Requirements
program-to-program communication or simple file trans -fer. The choice of the appropriate solution depends on the
Import of Data from Various Data
connection facilities: e.g. there may be a distributed envi -
ronment that allows RPCs, a file transfer protocol likeFTP, or just a connection via electronic mail. Therefore, a
An important issue in system architecture concerns the
TSMS should support different ways of data exchange.
distribution of databases. Researchers have a private time
Data Quality Management Require-
rent prices (closing, high, and low; opening prices are not
yet available), trading volume, etc. The header of an ad -justment factor time series contains a subset of the header
Time series management makes high demands on data
described above. The events contain the date and the ad -
quality management. New raw data has to be checked for
consistency with older events, outliers have to be detec -ted, noise has to be filtered to clean up the data, etc. To
Time series are selected according to standard selection
track the value of calibrated data back to the raw data, it
files which offer some predefined selection criteria, such
is often necessary to retain the raw and the calibrated
as Blue Chips International, European Stock Exchange,
Swiss Indices, and Foreign Exchange Rates. These crite -ria may serve to define groups. However, users cannot se -
Old estimations have to be replaced by newer ones when
lect groups according to their own criteria, e.g. all bank
new information is available, but one might like to retain
the older data to review and improve the estimation pro -cess. This requires a versioning concept. Current Solutions and Related Research
It should also be possible to store information on the qua -
lity of the data, whether for the entire time series or justfor individual events. Time series management with files
When time series are stored in simple files, there are se -
Synchronization Requirements
veral drawbacks: The advantages of a DBMS get lost.
Usually, there is only one process writing a time series
The concepts of groups and time series bases are not re -
while all other processes just read it. However, it is not
ally supported. Functional capabilities have to be imple -
yet clear whether it is enough to support only a single
mented as separate programs on top of the file system.
writer / multiple reader transaction model.
The same is true for data exchange as well as for data
Ordinarily, new data is appended to the end of a time se -
ries and all other events are rarely modified. This might
Relational DBMS
facilitate the synchronization of multiple users.
Time series management with RDBMS also has consi -
Example of a Time Series Base
derable disadvantages, e.g. time series are based on se -quences, whereas the relational data model uses a set
An example of a time series base is HIKU (Historische
concept. An RDBMS is neither suited to model recursive
Kurse, i.e. historical rates) from the Swiss company Tele -
structures nor to handle heterogeneous sets. The time
kurs. It contains approximately 10'000 time series from
concept within current RDBMS is not very sophisticated.
the financial world. The system has been operational
This means that data processing specialists would have to
since 1992, and the time series run from 1986 (1983 for
write specific applications for the economic researchers
Swiss shares). The quality of the data is checked by a
The time series are updated daily. Data are compressed
Object-oriented DBMS
and transferred to customers via file transfer over X.25.
OODBMS [Catt 91] provide a lot more capabilities to
Telekurs offers software for workstations and PCs which
model and implement time series than RDBMS. The
handles the communication and the compres -
main advantage is that arbitrary data types can be mode -
sion/decompression of the data. The size of the whole
led as classes. Information hiding, inheritance, and
time series base is approximately 4 GByte.
reusability are further strengths of OODBMS. Very use -
There are three kinds of time series in HIKU: Prices of
ful concepts for time series management are the declara -
financial instruments with raw values, prices of financial
tion of data types such as ordered collections or sets and
instruments with adjusted values, and adjustment factor
the handling of recursive structures and heterogeneous
time series. All their headers and events have lengths
sets. A sophisticated time concept, including calendar
from 18 to 44 Byte. The first two kinds of time series
functionality, can be modeled as special classes, and me -
have basically the same format. The header contains data
thods can realize complex operations on time series. The -
like the number of the financial instrument, the type (e.g.
refore, an OODBMS is a good basis on top of which a
stock or share), the currency, the location of the stock
exchange, the branch code, and the number of events. Specialized TSMS
The time series with raw values contain nominal prices;
To our knowledge, there are currently only very few
they do not consider changes in prices resulting from
commercial DBMSs specialized for time series manage -
splits of shares, the emission of bonus shares, etc. The
ment. The most mature and interesting of them is the
time series with adjusted values take these changes into
FAME system [Kotz 92]. FAME has many useful fea-
account. An event contains, among others: the date, diffe -
tures, but also disadvantages, e.g. poor search and retrie -
• The data model must support managing many time
val facilities, and no mechanisms for data quality mana -
series with record-like events and partitioning the
gement and data consistency control. The data model is
time series of a time series base into various groups.
not powerful enough: each event may have only one sca -
• The system has to support the derivation of new time
lar field, and the group concept – lists of time series na -
series and groups from existing ones.
mes from which members may be selected by pattern
• Retrieval of time series and groups must be provided
matching with simple wild-cards, not by content – is too
through a general search mechanism and adequate
limited. Finally, the 4GL requires a lot of experience and
• Facilities for efficient and easy to use data quality
Related research work
A lot of work has been done in the field of temporal da -
• The system must be able to handle data exchange
tabases ([SA 86], [Tans 93], [SS 92]). However, most
between a collection of loosely coupled time series
temporal database systems use an interval model, whe -
bases, a variety of other data sources, and data sinks.
reas a TSMS needs a time point model.
The temporal data model closest to our requirements is
In the following, we will discuss some of these goals.
described in [SS 87], [SS 93], [CS 93] and [SC 93]. In theformer version of the model, the main constructs are time
A system suited to end users
sequences (TS) and time sequence collections (TSC).
End users require simple basic concepts (e.g. with respect
Though being central modeling concepts, TSs and TSCs
to the data model) and user-friendly manipulation facili -
are treated as weak entities only. As a TSC includes the
ties (e.g. a graphical user interface consisting of several
time series of all objects of a class, a time series cannot
browsers and editors, as described in chapter 2.1.2).
be directly accessed but has always to be selected from
Data model
the TSC. TSCs must be homogeneous regarding all pro -
We propose a data model that comprises three basic ele -
perties, i.e. the properties of time sequences cannot vary
ments: events, time series, and groups. These elements
within the same collection (e.g. to record economic va -
lues of two countries at different granularity). Further -more, no means are provided to arbitrarily group time se -
Our TSMS will not only allow time series and groups to
be stored and retrieved, but it will also support the trans -formation of time series and the derivation of new time
In the newer version of the model, these characteristics
do no longer apply. However, as the model is based on apredefined set of extended relational operations, its ex -
Retrieval of time series and groups
pressional power with respect to time series transforma -
In addition to the browsers explained in chapter 2.1.2,
tion, filtering etc. is limited. For example, the model in
more traditional query facilities allow users to search for
[SC 93] does provide an interpolation function for time
groups and time series based on their data contents.
series (the "type" of a time series), but neither an aggrega -tion function nor individual interpolation functions for
Data exchange
different attributes of the same series. The notion of
The different components and the conditions under which
"Concept" found in [SC 93] mixes three orthogonal ideas:
data exchange takes place have been described in chapter
inheritance (IS-A hierarchy of Concepts), grouping of
time series according to some common usage feature and
Our intent is to build a generic specification formalism
a view mechanism based on event construction.
that describes all the parameters necessary for interopera -
Statistical databases [Mich 91], [HF 92], scientific data -
bility, e.g. the data formats, the source and destination of
bases [Mich 91], [HF 92], and spatial databases [LT 92]
the data, means of data exchange and exchange fre -
address certain problems that a TSMS has to deal with,
quency. The data exchange will be set up automatically
too. However, they give no preference to the time dimen -
according to the specification [Drey 93]. A system for personal and group use Goals for a Research Project in Time Se-
Concurrent updates to time series will be synchronized by
ries Management
a concurrency control mechanism. As stated in chapter2.4, such a mechanism can probably be realized with a
The current state of the art mentioned above led us to the
rather simple approach, because there is normally just one
conclusion that it is worthwhile to start a project on time
series management with the following goals:
• The TSMS must be suited to end users. Research Issues
The current situation and the goals we set for our project
Financial Data in an Extensible Database.
result in a number of research challenges:
Proc. of the 19th VLDB Conf., Dublin 1993.
• Definition of a time series data model: The data mo-
W. Dreyer: Interoperability Issues in Time
del must have enough expressive power to model a
broad spectrum of time series types and group types.
It must incorporate the functional capabilities to
support the derivation of new time series, the filte -
Schweizer Informatiker Gesellschaft, Data
Bases - Theory and Application; to be pu -blished in fall 1993.
• Implementation of a specialized DBMS on top of anoff-the-shelf OODBMS: We intend to build the
H. Hinterberger, J.C. French (eds.): Proc. 6th
TSMS on top of a commercially available, unmodi -
Intl. Working Conference on Scientific and
fied OODBMS. It will be interesting to experience in
what respect the OODBMS facilitates the construc -
tion and which features are missing.
• Data exchange concept: There are many discussions
going on in the area of database interoperability.
These discussions concern topics like data replica -
tion or schema integration. However, the question of
how to best handle multiple private and public data -
bases which loosely cooperate via data exchange has
[Mich 91] Z. Michalewicz: Statistical and Scientific
Databases. Ellis Horwood Ltd., 1991.
• Queries over heterogeneous collections: In our
TSMS, a group can contain various types of time se -
bases. IEEE Computer, 19 (9), Sept. 1986.
ries and groups. The realization of a flexible yet effi -
cient query facility that can handle such heteroge -
Time-Series Analysis. Workshop on Current
neous sets is an interesting problem.
Issues in Databases and Applications, Rut -
• Data quality management: Some conceptual work
gers Univ., Oct 1992. In: Advanced Data -
has been done in the area of data quality manage -
base Systems, editors: N. Adam and B.
ment [Wang 93], but few systems seamlessly inte -
grate data quality concepts. The existing solutions
Science Series, Springer Verlag, 1993.
A. Segev, A. Shoshani: Logical Modeling ofTemporal Data. In Proc. of the ACM SIG -
Conclusions
Time series management raises a variety of questions
ranging from data structures and functional capabilities to
M.D. Soo, R.T. Snodgrass: Multiple Calen -
user interfaces and interoperability. Existing solutions are
dar Support for Conventional Database Ma -
not satisfying, and related research work is only partially
nagement Systems. Univ. of Arizona, Dept.
directed towards our problem domain.
of Comp. Science, TR 92-07, Feb. 1992.
It is for these reasons that a research project in time series
management seems to be a worthwhile undertaking. The
system’s major goals will lie in the fields of retrieval and
transformation capabilities, end user orientation, data
exchange and data quality management. We also hope to
gain insight into the building of special-purpose data ma -
Benjamin/Cummings Publ. Comp., 1993.
nagement systems, which could ease the development of
[Wang 93] R. Y. Wang et al.: Data Quality Require -
References
the 9th Int. Conf. on Data Engineering, April1993.
R.G.G. Cattell: Object Data Management -Object-Oriented and Extended RelationalDatabase Systems. Addison-Wesley, 1991.
ALGAS E ESPONJAS: UMA FONTE DE SUBSTÂNCIAS COM ATIVIDADE ANTIVIRAL Odinéia do S. Pamplona*1; Bruno F. dos Santos2; Suzi M. Ribeiro3; Lísia M. Gestinari1; Maria Teresa V. Email: [email protected] Instituições: *Universidade Federal do Amazonas (UFAM), Instituto de Ciências Exatas e Tecnologia (ICET); 1 Núcleo em Ecologia e Desenvolvimento Sócio-Ambiental de Macaé/Univers
Thank You . . . to all for your financial support. Last weekend’s collection was over $1,400. . . also for your contribution to the Helping Others collection for Haiti. Like the week before, it was around $1,500. A cheque for $3,000 is on its way to Development & Peace. Coming Events Monday, Feb. 1 Pastoral council meeting in the office at 7:30 pm. All members are asked to do