Experimenting with a big data infrastructure for multimodal stream processing
FFI-Report
2020
About the publication
Report number
20/00480
ISBN
978-82-464-3254-0
Format
PDF-document
Size
4 MB
Language
English
It is an important part of the Armed Forces’ activities to monitor Norwegian land areas, the airspace,
the sea and cyberspace. This surveillance supports both traditional tasks such as defending
sovereignty or crisis and conflict management, as well as civil-military tasks such as rescue services
and environmental preparedness. The overall response time of the Armed Forces, as well as the
quality of its operational decisions, depend on the ability to perceive a situation, interpret it and
understand it, that is, on the level of situational awareness.
New communication technologies and the ever-increasing availability of computing power today make
it possible to utilize data of a variety and volume that can significantly enrich situational awareness in
the future.
From a computational point of view, progress in this area depends on whether we have computational
models that are able to translate data into relevant real time intelligence, and whether we are able to
coordinate machine clusters that, working together, are capable of adapting to potentially very large
spikes in the quantity or complexity of available information (complexity being understood as the
amount of processing power it takes to convert data into actionable intelligence).
In this report we take a closer look at some general properties such a machine cluster could
reasonably be expected to have, as well as the matching characteristics a surveillance algorithm
must have in order to exploit it efficiently. Although it is not reasonable to assume that all types of
surveillance tasks and information needs can be served with identical system support, the working
hypothesis in this report is that some general systemic properties will be sufficient for most cases.
Examples include, loose coupling, scalability, fault tolerance and parallelizability.
We do not claim to have proved or refuted this hypothesis (i.e. that some general systemic properties
will be sufficient for most cases), but will be content for now to adduce some experimental evidence
in support of it. In other words, the method we adopt is empirical. More specifically, we do an
experimental case study of high velocity stream reasoning supported by a scalable coordinated
machine cluster running a variety of software components and algorithms. The various components
and algorithms will be described in more detail as we go. By stream reasoning, we shall mean the
operation of turning continuously incoming data into actionable intelligence in real time. The case
study, more specifically, attempts to detect illegal, unreported, and unregulated fishing from a stream
of AIS reports, supplemented with geographic information as well as with additional on- and offline
information about ships, landing bills and more.
The experiment was designed to see to what extent standard software components can be utilised
to build a stream processing infrastructure meeting the identified requirements. The outcome
of the experiment was a confirmation that the chosen core components essentially provided a
streaming infrastructure with the desired properties, mainly due to the characteristics of the core
component Apache Kafka. The main deviation was the infrastructure’s fault tolerance ability: During
the experiment, further study of Kafka’s handling of network partitioning casted doubt over its ability
to handle such situations. As this experiment was conducted on a robust network, the infrastructure’s
tolerance for network partitioning was not tested. This is, however, an interesting avenue for further
work, as network partitioning is a characteristic of tactical networks.