Ubilab.net

Research Perspectives
for Time Series Management Systems
Werner Dreyer, Angelika Kotz Dittrich, Duri Schmidt UBILAB, Union Bank of Switzerland, Zurich {dreyer, dittrich, schmidt}@ubilab.ubs.ch Abstract
• As current TSMS are difficult to use, economists Empirical research based on time series is a data intensive cannot work without the help of a database specia - activity that needs a data base management system (DBMS). We investigate the special properties a time se - • Synchronization problems may arise when time se - ries management system (TSMS) should have. We then ries are used concurrently by different users.
show that currently available solutions and related rese - Research in time series management has to address these arch directions are not well suited to handle the existing problems and to provide adequate solutions.
problems. Therefore, we propose the development of a The remainder of the paper is organized as follows: First special purpose TSMS, which will offer particular mode - of all, we address the properties of TSMS. Then, we show ling, retrieval, and computation capabilities. It will be an example of a time series base. We look at current solu - suitable for end users, offer direct manipulation interfa - tions and at research related to our problem domain. Fi - ces, and allow data exchange with a variety of data sour - nally, we discuss the goals and the research issues of our ces, including other databases and application packages.
We intend to build such a system on top of an off-the -shelf object-oriented DBMS.
Requirements for TSMS
Introduction
Data Model Requirements
At Union Bank of Switzerland, several departments workintensively with time series. They encounter a number of Structural Elements
A data model for time series contains the following • The large volume of time series (several thousand) structural elements: Events, time series, groups of time makes their management a difficult task.
series, other data, and time series bases.
• In large time series bases, researchers have problems finding the time series relevant to their work.
Events are the basic building blocks of time series. An • When a time series base becomes too large, data qua - event consists of the event data, which is time-variant.
lity management is impossible without the help of a Examples of event data are the opening, low, high, and closing prices of a share. The event data can have an • Researchers usually do not work with all the time atomic or a structured data type. Atomic data types are series an institution collects. Instead, there is a need scalar types like integer or string. Records or arrays of to build and maintain project-specific time series atomic types are examples for structured types.
Time Series
• Such a time series base must contain data from pu - blic and company databases as well as project-pro - A time series consists of a header and a sequence of events ordered chronologically. Header data are sharedby all events. They may be time-invariant and describe • The same time series are used with many different common properties of the time series as a whole (e.g. the tools, e.g. statistics packages, spreadsheets, and location of the stock exchange), or they may be time-va - desktop publishing programs. Without the help of a riant and derived from the events (e.g. the average closing DBMS that can cope with the different data formats these tools use, researchers are forced to duplicatethe data.
Normally, all the events of a time series are of the sametype, but they can also vary over time.
• Researchers need specific system support for tasks like transforming the periodicity of time series, filte -ring, or computing new time series.
Data values are either base or derived values. Base values pany. A TSMS must support a flexible, powerful grou - record measured facts. Derived values are computed di - rectly or indirectly from base values.
A group consists of its header and its member set. The Data values differ in what they measure. One example are header contains data shared by all members, e.g. the stock values, i.e. values which measure a value at some group name, or data derived from some members, such as point in time, e.g. a price of a share. Stock values can be the covariance matrix of some time series contained in differentiated further. They may measure the value at the that group. The member set contains the time series and beginning or end of a period, or the highest, lowest or subgroups belonging to the group. It is necessary that a average value of a period. Another example are flow va - group can also contain other groups, that it is possible to lues. They measure a value over a period of time, like a define the members of a group by enumeration or by cash flow. These different kinds of values have different computation, and that members can belong to more than periodicity transformations. An example is the transfor - mation from a monthly to a quarterly periodicity: For the Other Data
high selling price, one has to choose the maximum of thethree monthly values, for the closing price the last value Applications which use time series may also have to ma - and for the cash flow the sum of the three values.
nage other persistent data that are not related to time se -ries or groups. Examples are results of simulations or A calendar is associated with each time series. The calen - constants. These data are often not time series themsel - dar does the mapping between events and the time when ves. There are also meta data, i.e. data describing the ac - the event occurred. In the case of time series with con - tual events, time series, and groups. Unfortunately, there stant time between events, the calendar also defines the is no clear definition of what meta data are. Often, one user's meta data are the other user's data, and vice versa.
Time series differ in their density, i.e. in how many It is not clear to what extent a TSMS should be able to events are recorded. Missing events may arise from di - cope with these data or how a TSMS should be combined verse reasons, such as that the event did not happen or the A further property is the ordering of the events over time.
Time Series Bases
The events of a time series may be completely ordered, All the time series, groups and other data managed as one i.e. there is at most one event per point in time. In logical unit make up a time series base. Its size depends contrast, the events of other time series may be only par - on the number of time series, their periodicity, the obser - tially ordered, i.e. more than one event with the same vation interval, and the size of the events.
time stamp may exist in the time series. An example forsuch a situation is the use of estimation methods for fu - Functional Requirements
ture values. A researcher may record several events per Distribution of Functional Capabilities
point in time, because he or she makes repeated estimati -ons for certain data values after having received more ac - As mentioned before, researchers use their time series with several programs, such as statistics packages,spreadsheets, and charting programs. These specialized The cardinality of a time series, i.e. the number of events programs provide a thorough functionality in their appli - of a time series, varies with the application area. In the cation area. Therefore, it makes no sense for the TSMS to case of economic time series, the cardinality ranges from replicate all the functionality these programs already of - several tens in the case of annual data to several thou - fer. It is more sensible to build an environment where the TSMS works as a repository for the time series, and the On the one hand, time series data are accessed along tools use this repository to provide their special services.
time, i.e. event sequences of one time series are exami - The emphasis of the TSMS should, therefore, be on data ned. On the other hand, researchers also use cross-sectio - management capabilities and on basic transformation ca - nal analysis, whereby several time series are explored at a pabilities to prepare the data for further analysis.
certain point in time. The data model must allow efficientstorage and access methods for both ways.
Operations on Events
A TSMS should support the usual operations on atomic types, and must provide read and update access to the Detecting time series in large time series bases relevant to components of structured event types.
the interests of a user is an important issue. One way tofacilitate this is to partition the universe of all time series Operations on Time Series
of a time series base into several groups according to va - The most important functional capabilities of a TSMS are rious criteria, e.g. branches, country, or size of the com - the definition of new time series, the storing of data in time series, and the retrieval of data from time series.
series base for their individual research, but they also Another important capability of a TSMS is the derivation need access to other databases, e.g. time series bases of of new time series from existing ones. Examples are the other members within the research group, or company- computation of the difference of two time series, the ap - wide financial databases. Most of these databases do not plication of a moving average or other filters to a time se - use the same data model or the same schema as the pri - ries, and the transformation of the periodicity of a time vate time series base. They may reside on the same com - puter, on the same local area network or on a remote A TSMS should provide decent query capabilities. It should at least support a select operation for time series Researchers frequently want to copy a subset of such similar to the select operation of the relational algebra. It other databases into their private time series base. Such a would also be useful to be able to apply not only logical copy operation may be triggered manually, by some time operations in the selection condition but also other opera - event or by some other event of interest, e.g. a price tions, such as arithmetic and time-related operations. An crossing a certain threshold. Therefore, specifying va - optimizer should support users in query processing to rious update policies and events in a TSMS must be easy.
In addition to databases, there are various other data Operations on Groups
sources. Data may be gathered manually. At UBS, for ex -ample, the Gross National Product of several countries is Additional operations are required to define groups, to estimated and entered manually. Other data are received store and to retrieve group data, to add and remove mem - as files from external data providers like OECD. Fur - bers, and to enumerate them. The manipulation of the thermore, data is also received as a continuous data member set should be facilitated by set operations. One stream from other computers. Stock prices, for example, should also be able to apply operations to all members of can be received as a real time data feed from ticker servi - a group without explicitly iterating over them.
Calendar
Data sources differ in the amount of data and in the fre - A TSMS must support calendar-related operations and quency of updates. Manual entry generates only a small date arithmetic, e.g. computing the difference between amount of new data and the frequency of updates is low.
two dates, even if the two time series have different peri - In contrast, a real time data feed often generates a large odicities. Different time series may be based on different amount of data. However, for research purposes, it is calendars, e.g. a business calendar with five business days normally sufficient if the TSMS can cope with fewer up - per week or a calendar that can cope with local holidays.
Browsing and Editing
Usually, the format of data from various sources also dif -fers. A TSMS must be able to handle all these formats.
Two other important functions are browsing and editing.
Browsing means interactively exploring the time series Normally, the available file formats for data exchange are available in a time series base and investigating their not satisfying, because they do not define how to format contents. Such a browsing tool also helps to get an over - header data like periodicity, etc. It might be necessary to view of the group structures. Editing means interactively develop a specific format to make lossless data exchange building new or updating existing time series.
A browser-editor also needs some presentation facilities Export of Data to Client Applications
capable of displaying time series in table or chart formats.
However, these presentation facilities are rather basic.
When we export data to programs like statistical packa - For a more sophisticated presentation, one can use specia - ges or spreadsheets, we are faced with similar problems So far, we have described requirements of a TSMS data Means of Data Exchange
model. One of the goals of the project will be defining adata model and deciding which of the structures and func - The different sources for data import and sinks for data tional capabilities mentioned should be incorporated.
export provide different ways of data exchange. Interfa -ces between components can be realized, for example, via Data Exchange Requirements
program-to-program communication or simple file trans -fer. The choice of the appropriate solution depends on the Import of Data from Various Data
connection facilities: e.g. there may be a distributed envi - ronment that allows RPCs, a file transfer protocol likeFTP, or just a connection via electronic mail. Therefore, a An important issue in system architecture concerns the TSMS should support different ways of data exchange.
distribution of databases. Researchers have a private time Data Quality Management Require-
rent prices (closing, high, and low; opening prices are not yet available), trading volume, etc. The header of an ad -justment factor time series contains a subset of the header Time series management makes high demands on data described above. The events contain the date and the ad - quality management. New raw data has to be checked for consistency with older events, outliers have to be detec -ted, noise has to be filtered to clean up the data, etc. To Time series are selected according to standard selection track the value of calibrated data back to the raw data, it files which offer some predefined selection criteria, such is often necessary to retain the raw and the calibrated as Blue Chips International, European Stock Exchange, Swiss Indices, and Foreign Exchange Rates. These crite -ria may serve to define groups. However, users cannot se - Old estimations have to be replaced by newer ones when lect groups according to their own criteria, e.g. all bank new information is available, but one might like to retain the older data to review and improve the estimation pro -cess. This requires a versioning concept.
Current Solutions and Related Research
It should also be possible to store information on the qua - lity of the data, whether for the entire time series or justfor individual events.
Time series management with files
When time series are stored in simple files, there are se - Synchronization Requirements
veral drawbacks: The advantages of a DBMS get lost.
Usually, there is only one process writing a time series The concepts of groups and time series bases are not re - while all other processes just read it. However, it is not ally supported. Functional capabilities have to be imple - yet clear whether it is enough to support only a single mented as separate programs on top of the file system.
writer / multiple reader transaction model.
The same is true for data exchange as well as for data Ordinarily, new data is appended to the end of a time se - ries and all other events are rarely modified. This might Relational DBMS
facilitate the synchronization of multiple users.
Time series management with RDBMS also has consi - Example of a Time Series Base
derable disadvantages, e.g. time series are based on se -quences, whereas the relational data model uses a set An example of a time series base is HIKU (Historische concept. An RDBMS is neither suited to model recursive Kurse, i.e. historical rates) from the Swiss company Tele - structures nor to handle heterogeneous sets. The time kurs. It contains approximately 10'000 time series from concept within current RDBMS is not very sophisticated.
the financial world. The system has been operational This means that data processing specialists would have to since 1992, and the time series run from 1986 (1983 for write specific applications for the economic researchers Swiss shares). The quality of the data is checked by a The time series are updated daily. Data are compressed Object-oriented DBMS
and transferred to customers via file transfer over X.25.
OODBMS [Catt 91] provide a lot more capabilities to Telekurs offers software for workstations and PCs which model and implement time series than RDBMS. The handles the communication and the compres - main advantage is that arbitrary data types can be mode - sion/decompression of the data. The size of the whole led as classes. Information hiding, inheritance, and time series base is approximately 4 GByte.
reusability are further strengths of OODBMS. Very use - There are three kinds of time series in HIKU: Prices of ful concepts for time series management are the declara - financial instruments with raw values, prices of financial tion of data types such as ordered collections or sets and instruments with adjusted values, and adjustment factor the handling of recursive structures and heterogeneous time series. All their headers and events have lengths sets. A sophisticated time concept, including calendar from 18 to 44 Byte. The first two kinds of time series functionality, can be modeled as special classes, and me - have basically the same format. The header contains data thods can realize complex operations on time series. The - like the number of the financial instrument, the type (e.g.
refore, an OODBMS is a good basis on top of which a stock or share), the currency, the location of the stock exchange, the branch code, and the number of events.
Specialized TSMS
The time series with raw values contain nominal prices; To our knowledge, there are currently only very few they do not consider changes in prices resulting from commercial DBMSs specialized for time series manage - splits of shares, the emission of bonus shares, etc. The ment. The most mature and interesting of them is the time series with adjusted values take these changes into FAME system [Kotz 92]. FAME has many useful fea- account. An event contains, among others: the date, diffe - tures, but also disadvantages, e.g. poor search and retrie - • The data model must support managing many time val facilities, and no mechanisms for data quality mana - series with record-like events and partitioning the gement and data consistency control. The data model is time series of a time series base into various groups.
not powerful enough: each event may have only one sca - • The system has to support the derivation of new time lar field, and the group concept – lists of time series na - series and groups from existing ones.
mes from which members may be selected by pattern • Retrieval of time series and groups must be provided matching with simple wild-cards, not by content – is too through a general search mechanism and adequate limited. Finally, the 4GL requires a lot of experience and • Facilities for efficient and easy to use data quality Related research work
A lot of work has been done in the field of temporal da - • The system must be able to handle data exchange tabases ([SA 86], [Tans 93], [SS 92]). However, most between a collection of loosely coupled time series temporal database systems use an interval model, whe - bases, a variety of other data sources, and data sinks.
reas a TSMS needs a time point model.
The temporal data model closest to our requirements is In the following, we will discuss some of these goals.
described in [SS 87], [SS 93], [CS 93] and [SC 93]. In theformer version of the model, the main constructs are time A system suited to end users
sequences (TS) and time sequence collections (TSC).
End users require simple basic concepts (e.g. with respect Though being central modeling concepts, TSs and TSCs to the data model) and user-friendly manipulation facili - are treated as weak entities only. As a TSC includes the ties (e.g. a graphical user interface consisting of several time series of all objects of a class, a time series cannot browsers and editors, as described in chapter 2.1.2).
be directly accessed but has always to be selected from Data model
the TSC. TSCs must be homogeneous regarding all pro - We propose a data model that comprises three basic ele - perties, i.e. the properties of time sequences cannot vary ments: events, time series, and groups. These elements within the same collection (e.g. to record economic va - lues of two countries at different granularity). Further -more, no means are provided to arbitrarily group time se - Our TSMS will not only allow time series and groups to be stored and retrieved, but it will also support the trans -formation of time series and the derivation of new time In the newer version of the model, these characteristics do no longer apply. However, as the model is based on apredefined set of extended relational operations, its ex - Retrieval of time series and groups
pressional power with respect to time series transforma - In addition to the browsers explained in chapter 2.1.2, tion, filtering etc. is limited. For example, the model in more traditional query facilities allow users to search for [SC 93] does provide an interpolation function for time groups and time series based on their data contents.
series (the "type" of a time series), but neither an aggrega -tion function nor individual interpolation functions for Data exchange
different attributes of the same series. The notion of The different components and the conditions under which "Concept" found in [SC 93] mixes three orthogonal ideas: data exchange takes place have been described in chapter inheritance (IS-A hierarchy of Concepts), grouping of time series according to some common usage feature and Our intent is to build a generic specification formalism a view mechanism based on event construction.
that describes all the parameters necessary for interopera - Statistical databases [Mich 91], [HF 92], scientific data - bility, e.g. the data formats, the source and destination of bases [Mich 91], [HF 92], and spatial databases [LT 92] the data, means of data exchange and exchange fre - address certain problems that a TSMS has to deal with, quency. The data exchange will be set up automatically too. However, they give no preference to the time dimen - according to the specification [Drey 93].
A system for personal and group use
Goals for a Research Project in Time Se-
Concurrent updates to time series will be synchronized by ries Management
a concurrency control mechanism. As stated in chapter2.4, such a mechanism can probably be realized with a The current state of the art mentioned above led us to the rather simple approach, because there is normally just one conclusion that it is worthwhile to start a project on time series management with the following goals: • The TSMS must be suited to end users.
Research Issues
The current situation and the goals we set for our project Financial Data in an Extensible Database.
result in a number of research challenges: Proc. of the 19th VLDB Conf., Dublin 1993.
Definition of a time series data model: The data mo- W. Dreyer: Interoperability Issues in Time del must have enough expressive power to model a broad spectrum of time series types and group types.
It must incorporate the functional capabilities to support the derivation of new time series, the filte - Schweizer Informatiker Gesellschaft, Data Bases - Theory and Application; to be pu -blished in fall 1993.
Implementation of a specialized DBMS on top of an off-the-shelf OODBMS: We intend to build the H. Hinterberger, J.C. French (eds.): Proc. 6th TSMS on top of a commercially available, unmodi - Intl. Working Conference on Scientific and fied OODBMS. It will be interesting to experience in what respect the OODBMS facilitates the construc - tion and which features are missing.
Data exchange concept: There are many discussions going on in the area of database interoperability.
These discussions concern topics like data replica - tion or schema integration. However, the question of how to best handle multiple private and public data - bases which loosely cooperate via data exchange has [Mich 91] Z. Michalewicz: Statistical and Scientific Databases. Ellis Horwood Ltd., 1991.
Queries over heterogeneous collections: In our TSMS, a group can contain various types of time se - bases. IEEE Computer, 19 (9), Sept. 1986.
ries and groups. The realization of a flexible yet effi - cient query facility that can handle such heteroge - Time-Series Analysis. Workshop on Current neous sets is an interesting problem.
Issues in Databases and Applications, Rut - • Data quality management: Some conceptual work gers Univ., Oct 1992. In: Advanced Data - has been done in the area of data quality manage - base Systems, editors: N. Adam and B.
ment [Wang 93], but few systems seamlessly inte - grate data quality concepts. The existing solutions Science Series, Springer Verlag, 1993.
A. Segev, A. Shoshani: Logical Modeling ofTemporal Data. In Proc. of the ACM SIG - Conclusions
Time series management raises a variety of questions ranging from data structures and functional capabilities to M.D. Soo, R.T. Snodgrass: Multiple Calen - user interfaces and interoperability. Existing solutions are dar Support for Conventional Database Ma - not satisfying, and related research work is only partially nagement Systems. Univ. of Arizona, Dept.
directed towards our problem domain.
of Comp. Science, TR 92-07, Feb. 1992.
It is for these reasons that a research project in time series management seems to be a worthwhile undertaking. The system’s major goals will lie in the fields of retrieval and transformation capabilities, end user orientation, data exchange and data quality management. We also hope to gain insight into the building of special-purpose data ma - Benjamin/Cummings Publ. Comp., 1993.
nagement systems, which could ease the development of [Wang 93] R. Y. Wang et al.: Data Quality Require - References
the 9th Int. Conf. on Data Engineering, April1993.
R.G.G. Cattell: Object Data Management -Object-Oriented and Extended RelationalDatabase Systems. Addison-Wesley, 1991.

Source: http://www.ubilab.net/publications/print_versions/pdf/dre94a.pdf

Microsoft word - resumo odineia.doc

ALGAS E ESPONJAS: UMA FONTE DE SUBSTÂNCIAS COM ATIVIDADE ANTIVIRAL Odinéia do S. Pamplona*1; Bruno F. dos Santos2; Suzi M. Ribeiro3; Lísia M. Gestinari1; Maria Teresa V. Email: [email protected] Instituições: *Universidade Federal do Amazonas (UFAM), Instituto de Ciências Exatas e Tecnologia (ICET); 1 Núcleo em Ecologia e Desenvolvimento Sócio-Ambiental de Macaé/Univers

Bulletin_10-01-3

Thank You . . . to all for your financial support. Last weekend’s collection was over $1,400. . . also for your contribution to the Helping Others collection for Haiti. Like the week before, it was around $1,500. A cheque for $3,000 is on its way to Development & Peace. Coming Events Monday, Feb. 1 Pastoral council meeting in the office at 7:30 pm. All members are asked to do

Copyright © 2018 Medical Abstracts