Exploring data reuse using a big data infrastructure

FFI-Report 2020
This abstract and publication is only available in Norwegian
Jonas Halvorsen Bjørn Jervell Hansen

Making good military decisions requires a high level of situational awareness, and building this situational awareness is improved by access to as much relevant information as possible. This information can arrive to a decision maker via many different avenues, one of which is the reuse of information already collected or prepared for other purposes.

Data reuse is acknowledged as an important ingredient in the process for a military organization to fulfill their information needs by both NATO and the Norwegian Armed Forces as they the last 15 years have sought to turn their data strategies from the traditional need-to-know to the more open responsibility-to-share paradigm.

Ubiquitous information sharing and reuse have, however, certain prerequisites in order for it to happen. For example, the sharer of data must have trust that only authorized users will have access to it. The potential user, on the other hand, must be able to determine the provenance and reliability of the data, and whether or not it is in a suitable format, before eventual use.

This report documents a technical experiment setting out to explore whether it is feasible to build a big data infrastructure with the appropriate requirements to make it suitable for data reuse in the military domain using open source components. The exploration is supported by an experimental setup that expands on a previously explored big data infrastructure based on open-source components, extending it with suitable components for facilitating data reuse. Specifically, the two lines of inquiry explored in this report are

  1. Simplifying the re-purposing and joining of data sets by publishing data as linked data, which is a structured representation that makes it easy to interlink with other data.
  2. Utilizing lineage-based data governance for provenance tracking and fine-grained access control in a big data ecosystem that is comprised of many different components.

The technical exploration is performed against a fictitious backdrop of real-time news analysis, where a team of analysts keeps track of events in a region in support of an on-going military operation. This case requires merging of information from real-time news streams together with static background knowledge. The technical infrastructure is laid out and explained from a conceptual level, including brief introductions to the components used. Key features, as well as how they address the outlined issues with respect to data reuse, are explained and highlighted through the use of the underlying news analysis case.

As identified in a previous FFI report, there is no one generic big data infrastructure that fits all; the choice of big data components in an infrastructure is very much dictated by the case and problem at hand, and the setup explored in this report, which was crafted for a specific case, is no exception to this. The main contribution of this report is that it provides the reader with an example of how today’s open-source, off-the-shelf big data technologies from the civilian sector can be utilized in the military domain to facilitate data reuse, governance and fine-grained access control. The results are thus supporting evidence of the feasibility of building such an infrastructure, and can be of utility for personnel considering different architectural approaches for dealing with information management in a military setting.

Newly published