Simple Data Obfuscation of text documents

What is data obfuscation, it is the prevention of the disclosure of private information. There are many ways to achieve data obfuscation for example in the text domain, a user can specify the words (parameters) that they wish to replace in the document. The code then would parse the document and compare each word to a lookup table of sensitive words. The code could perform either a non-reversible or reversible transformation on the text.

For example, a non-reversible would be the deletion of the sensitive data, whereas the replacement of the data using a look-up table would allow a recreation of the sensitive document, if the lookup table is known.

The basic flow chart for a program would be,



input sensitive words

store sensitive words, generate replacement words

parse sensitive document, identify sensitive words

replace or delete sensitive words

output cleansed document



The goal of obfuscation is the removal or replacement of sensitive text from a confidential document, in a manner that does not expose the identifiable information or the confidential information when the desensitized document is released.



For example,

In 1952 the government located a UFO from Mars. The UFO was constructed of super-natural materials and the aliens did not survive.



keywords, UFO, Mars, super-natural, aliens

replace, fishing vessel, wooden, fisherman



Matlab code to follow

References:

Macbride et al. "A Comparative Study of Java Obfuscators", Proceedings of the IASTED International Conference on Software Engineering and Applications (SEA 2005), pages 14-16, src: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.267

Wikipedia, ROT13 - simple substitution cipher  http://en.wikipedia.org/wiki/ROT13

Wikipedia, substitution cipher - http://en.wikipedia.org/wiki/Substitution_cipher

Introduction to Cryptography by Network Associates, Inc. and its Affiliated Companies (1990-1999) src: http://www.pgpi.org/doc/guide/6.5/en/intro/

Tools and Code?
http://rumkin.com/tools/cipher/
http://www.partow.net/programming/hashfunctions/