Inside Negative Databases
In a negative database the negative image of
a set of data records is represented rather than the records
themselves.
|
|
|
|
In a negative database, a collection of data is represented by its
logical complement. The term positive information
denotes the elements of the category or set of interest (e.g., self)
while negative information denotes all the other elements of
the universe. A negative database is then a representation
of the negative information.
|
|
|
|
|
Initially, we assume a universe U of
finite-length records (or strings), all of the same length l,
and defined over a binary alphabet. We logically divide the
space of possible strings into two disjoint sets: DB
representing the positive records (holding the information of
interest), and U - DB denoting the set of all strings
not in DB. We assume that
DB is uncompressed (each record is represented explicitly), but
we allow U - DB to be stored in a compressed form
called NDB.
|
|
|
|
|
In order to create a database NDB that is
reasonable in size, we must compress the information contained in
U - DB but retain the ability to answer
queries. We introduce one additional symbol to our binary
alphabet, known as a "don't care," written as *. The
entries in NDB will thus be l-length strings over the alphabet {0, 1, *}. The don't care symbol has the usual interpretation and
will match either a one or a zero at the bit position where the
* appears.
|
|
|
At this point, before
moving to the current state of negative database technology, we
encourage you try the 'batch'
demo on a simple example or two. This introductory
prototype allows you to define a small database and query
it. We provide links so that you can see the negative
database in its three stages of development:
|
|
|
|
- NDB - first stage results of the Prefix Algorithm
- KNDB - second stage finds the essential bits, the c_key,
for each NDB record. A c-key defines a minimal
pattern in that the removal of any bit yields a pattern in DB.
- RNDB - final stage randomizes and disguises the minimal negative
data. This is the file that is used for queries and the SAT
Solver.
|
|
|
|
Plots of the three negative database show the distribution of the
defined bit positions in a permuted state and after reversal---the goal is
that there will be no discernable pattern. Lastly, the SAT
Solver link will attempt to "solve" the negative database to discover
positive information. We use these results to meter
the difficulty of reversibility.
Next, the 'on-line' demo, based on
[EAFH04], demonstrates the latest
developments in our current research. This version allows
you to define a database, query it, and update it
incrementally. In addition to the bit distribution and SAT
Solver links, you can plot the growth of the negative database with
each update, and experiment with the effects of the cleanup and morph
operations [EAFH05]. The
web-based demo provides limits for the runtime of each update, the
maximum record length, and the number of random positive DB records in
Option 1. The number of extra bits the algorithm may use,
n, affects the complexity of a negative database and applies to
the initial NDB as well as subsequent updates unless changed during a
"morph". The value of n is set to one for
Options 1 & 2 and is
user-specified in Option 3. Lastly, Option 4
allows you to recall a recent negative database using the ID
displayed at the bottom of the page.
The NDB software that supports these demonstrations is
available for download.
|