{updated: Thu Aug 24, 2006}

Inside Negative Databases

In a negative database the negative image of a set of data records is represented rather than the records themselves.
   
    In a negative database, a collection of data is represented by its logical complement.  The term positive information denotes the elements of the category or set of interest (e.g., self) while negative information denotes all the other elements of the universe.  A negative database is then a representation of the negative information.
   
       Initially, we assume a universe U of finite-length records (or strings), all of the same length l, and defined over a binary alphabet.  We logically divide the space of possible strings into two disjoint sets: DB representing the positive records (holding the information of interest), and U - DB denoting the set of all strings not in DB.  We assume that DB is uncompressed (each record is represented explicitly), but we allow U - DB to be stored in a compressed form called NDB.
   
       In order to create a database NDB that is reasonable in size, we must compress the information contained in U - DB but retain the ability to answer queries.  We introduce one additional symbol to our binary alphabet, known as a "don't care," written as *.  The entries in NDB will thus be l-length strings over the alphabet {0, 1, *}.  The don't care symbol has the usual interpretation and will match either a one or a zero at the bit position where the * appears.
   
   At this point, before moving to the current state of negative database technology, we encourage you try the 'batch' demo on a simple example or two.  This introductory prototype allows you to define a small database and query it.  We provide links so that you can see the negative database in its three stages of development:
   
   
  1. NDB - first stage results of the Prefix Algorithm
  2. KNDB - second stage finds the essential bits, the c_key, for each NDB record.  A c-key defines a minimal pattern in that the removal of any bit yields a pattern in DB.
  3. RNDB - final stage randomizes and disguises the minimal negative data.  This is the file that is used for queries and the SAT Solver.
   
   
Plots of the three negative database show the distribution of the defined bit positions in a permuted state and after reversal---the goal is that there will be no discernable pattern.  Lastly, the SAT Solver link will attempt to "solve" the negative database to discover positive information.  We use these results to meter the difficulty of reversibility.  
  
   Next, the 'on-line' demo, based on [EAFH04], demonstrates the latest developments in our current research.  This version allows you to define a database, query it, and update it incrementally.  In addition to the bit distribution and SAT Solver links, you can plot the growth of the negative database with each update, and experiment with the effects of the cleanup and morph operations [EAFH05].  The web-based demo provides limits for the runtime of each update, the maximum record length, and the number of random positive DB records in Option 1.  The number of extra bits the algorithm may use, n, affects the complexity of a negative database and applies to the initial NDB as well as subsequent updates unless changed during a "morph".  The value of n is set to one for Options 1 & 2 and is user-specified in Option 3.  Lastly, Option 4 allows you to recall a recent negative database using the ID displayed at the bottom of the page.

   The NDB software that supports these demonstrations is available for download.


For more information about Negative Databases, please contact
Fernando Esponda at fesponda@cs.unm.edu