StreamMine [PDF ~1.7MB] Principles and Applications of Distributed Event-Based Systems,
2010
Authors:
Christof Fetzer and
Andrey Brito and Robert Fach and Zbigniew Jerzak
StreamMine is a novel distributed event processing system that we are currently
developing. It is architected for running on large clusters and uses content-based
publish/subscribe middleware for the communication between event processing
components. Event processing components need to enforce application-specific ordering
constraints, e.g., all events need to be processed in order of some given time stamp. To
harness the power of modern multi-core computers, we support the speculative execution
of events. Ordering conflicts are detected dynamically and rolled backed using a
Software Transactional Memory.
To ensure sustainable manufacturing the energy consumption
of production processes must be minimized. SAP Research
is active in two European Community's Seventh Framework
Program (FP7) projects: GINSENG and KAP. The combined
aim of GINSENG and KAP projects is to allow for
event-based monitoring of the Production Performance Indicators
(PPI) that can be used to, e.g., shut down idle
machines or to optimize the material flow.
In this demo we illustrate how the energy consumption
for lightening and heating can be controlled with the use of
wireless sensor nodes that measure temperature, light and
humidity. Events describing the indoor climate, originating
from TelosB motes, are processed using the middleware
developed within the GINSENG project and subsequently
visualized using a dashboard. Visualization includes also
warnings regarding excessive energy consumption, which can
be simulated with light beams or hand warmth.
Just as packet switched networks constituted a major breakthrough in our
perception of the information exchange in computer networks so have the
decoupling properties of publish/subscribe systems revolutionized the way
we look at networking in the context of large scale distributed systems. The
decoupling of the components of publish/subscribe systems in time, space and
synchronization has created an appealing platform for the asynchronous information
exchange among anonymous information producers and consumers.
Moreover, the content-based nature of publish/subscribe systems provides a
great degree of flexibility and expressiveness as far as construction of data
flows is considered.
However, a number of challenges and not yet addressed issued still exists in the
area of the publish/subscribe systems. One active area of research is directed
toward the problem of the efficient content delivery in the content-based publish/subscribe networks.
Routing of the information based on the information
itself, instead of the explicit source and destination addresses poses challenges
as far as efficiency and processing times are concerned. Simultaneously, due
to their decoupled nature, publish/subscribe systems introduce new challenges
with respect to issues related to dependability and fail-awareness.
This thesis seeks to advance the field of research in both directions. First, it
shows the design and implementation of routing algorithms based on the end-to-end
systems design principle. Proposed routing algorithms obsolete the need to
perform content-based routing within the publish/subscribe network, pushing
this task to the edge of the system. Moreover, this thesis presents a fail-aware
approach towards construction of the content-based publish/subscribe system
along with its application to the creation of the soft state publish/subscribe
system. A soft state publish/subscribe system exposes the self stabilizing
behavior as far as transient timing, link and node failures are concerned. The
result of this thesis is a family of the XSiena content-based publish/subscribe
systems, implementing the proposed concepts and algorithms. The family
of the XSiena content-based publish/subscribe systems has been a subject to
rigorous evaluation, which confirms the claims made in this thesis.
Building survivable content-based publish/subscribe systems
is difficult. Every node in a distributed publish/subscribe
system stores a significant amount of routing state which can
be easily corrupted due to message omissions, link and node
failures. In this paper, we show how to build a soft state
content-based publish/subscribe system where the whole
state is stored at the edge of the publish/subscribe network, at
the entity which is utilizing the state. This results in a robust
and resilient system, as the routing state is permanently lost
or corrupted only if the endpoint entity associated with the
given state permanently fails.
This paper presents our experiences with building of the
soft state XSiena publish/subscribe system. We provide
a brief overview of our approach towards the soft state in
publish/subscribe systems along with the detailed discussion
of the liveness and safety issues as well as the choice of the
API and handling of the unsubscription and unadvertisement
messages.
Existing clock synchronization algorithms assume a
bounded clock reading error. This, in turn, results in an inflexible design
that typically requires node crashes whenever the given bound might be
violated. We propose a novel, adaptive internal clock synchronization
algorithm which allows to compute the deviation between the clocks during
runtime. The computed deviation can be propagated to the application layer
to allow it to adapt its behavior according to the current clock deviation.
The contributions of this paper are: (1) a new specification of a relaxed
clock synchronization problem, and (2) a new clock synchronization algorithm
with a novel approach to dealing with crash failures.
Achieving expressive and efficient content-based
routing in publish/subscribe systems is a difficult problem. Traditional
approaches prove to be either inefficient or severely limited in their
expressiveness and flexibility. We present a novel routing method, based on
Bloom filters, which shows high efficiency while simultaneously preserving
the flexibility of content-based schemes. The resulting implementation is a
fast, flexible and fully decoupled content-based publish/subscribe
system.
StreamMine is a scalable middleware for massive
real-time data streaming. In this paper we present the BFSiena: a
communication substrate for the StreamMine. BFSiena is a content-based,
publish/subscribe communication system which provides support for arbitrary
predicate based messaging in the acyclic peer to peer networks. BFSiena is a
low latency, high throughput communication system which is well suited for
the processing frameworks, like StreamMine.
In this paper we present a wide area distributed
system using a content-based publish/subscribe communication middleware
which can deterministically detect and report failures with respect to
timely message delivery and message omission. Our approach does not require
external clock synchronization nor does it impose any constraints on the
publish/subscribe middleware. We show that our system performs better and is
safer than when using NTP for external clock synchronization. We provide a
proof of concept implementation and present results of experiments carried
out in the PlanetLab environment.
We present a prefix forwarding algorithm for
content-based publish/subscribe systems. Our algorithm performs only one
content-based match per message regardless of the number of routers (hops)
traversed from the source to the destination. Moreover, prefix forwarding
preserves the decoupling properties of publish/subscribe system. Prefix
forwarding does not put any restriction on the content of the messages. The
presented algorithm does not introduce any false negatives and allows to
tune the false positive rate to balance the bandwidth and processing
overheads. We provide experimental results confirming the properties of the
proposed approach.
This paper proposes a new approach for handling
overload in Publish/Subscribe systems. We focus on the fact that every
service has to cope with the limitations imposed by the external
environment (e.g., network congestion) and the limitations resulting
directly from within the service itself – e.g., the maximum available
computational power. This work seeks to aid Publish/Subscribe mechanisms
with the ability to handle both aforementioned constraints in a graceful
way, simultaneously ensuring for the most valuable information to be given
the highest chance of a successful delivery.
2005
Highly Available Publish/Subscribe [PDF ~200KB] Supplement Proceedings of the High Assurance Systems Engineering Conference - HASE2005,
pp. 11-12, Heidelberg, Germany, October 2005
Authors: Zbigniew Jerzak
We propose a novel approach for ensuring the
availability of a publish/subscribe (P/S) service with limited resources.
Our approach complies with the fully decoupled nature of P/S services.
The user-perceived web server performance has a
strong bearing on popularity of a portal, site or service as users may
choose not to return to a particular web page if it takes a long time to
retrieve its content. We present a new approach to how a server can
identify incoming clients that suffer from a poor performance and how
can it react in order to improve a user-perceived connection quality
based on that identification.
In this article we propose a new tool, named Raw
Packet Sender (RPS), for testing the performance of WWW servers. Our
solution allows for testing with arbitrary number of source IP addresses
although the traffic originates from only one physical NIC. In order to
better mimic the real life environment we implemented an HTTP session
interarrival time generator based on Markov Modulated Poisson Process
(MMPP), which can closely match auto-covariance and the marginal
distribution of recorded web traffic traces.