In the ancient Indian parable of the elephant, six blind men touch an elephant and report six very different views of the same animal. Compare this scenario to a data warehouse that is getting data from six different sources. “Harry Potter and the Sorcerer’s Stone” as a field in a database can be written as “HP and the Sorcerer’s Stone” or as “Harry Potter I” or simply – “Sorcerer’s Stone”. In the data warehouse these are four separate movie titles. For a Harry Potter fan, they are the same movie. Now increase the number of movies to cover the entire Harry Potter series and further include fifty languages. You now have a set of titles which may perplex even a real Harry Potter aficionado.
What does this have to do with data analytics?