Explaining Query Answers in Probabilistic Databases.
DOI:
https://doi.org/10.9781/ijimai.2023.07.005Keywords:
Causality, Conjunctive Queries, Explanation, Probabilistic Databases, Query AnswersAbstract
Probabilistic databases have emerged as an extension of relational databases that can handle uncertain data under possible worlds semantics. Although the problems of creating effective means of probabilistic data representation as well as probabilistic query evaluation have been addressed so widely, low attention has been given to query result explanation. While query answer explanation in relational databases tends to answer the question: why is this tuple in the query result? In probabilistic databases, we should ask an additional question: why does this tuple have such a probability? Due to the huge number of resulting worlds of probabilistic databases, query explanation in probabilistic databases is a challenging task. In this paper, we propose a causal explanation technique for conjunctive queries in probabilistic databases. Based on the notions of causality, responsibility and blame, we will be able to address explanation for tuple and attribute uncertainties in a complementary way. Through an experiment on the real-dataset of IMDB, we will see that this framework would be helpful for explaining complex queries results. Comparing to existing explanation methods, our method could be also considered as an aided-diagnosis method through computing the blame, which helps to understand the impact of uncertain attributes.
Downloads
References
Carnegie Mellon University, "Read the Web" Research Project Website Accessed: Oct. 19, 2022. [online]. available: http://rtw.ml.cmu.edu/rtw//
D. Suciu, D. Olteanu, C. Re, C. Koch, Probabilistic Databases. Morgan and Claypool Publishers, 2010.
P. Bosc, O. Pivert, “Modeling and querying uncertain relational databases: A survey of approaches based on the possible worlds semantics,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 18, no. 5, pp. 565–603, 2010.
N. Dalvi, D. Suciu, “Efficient query evaluation on probabilistic databases,” The International Journal on Very Large Data Bases, vol. 16, no. 4, pp. 523–544, 2007.
X. Dong, Y. Luming, “Study on consistent query answering in inconsistent databases,” Frontiers of Computer Science in China, vol. 1, no. 4, pp. 493–501, 2007.
M. Arenas, L. E. Bertossi, J. Chomicki, “Consistent query answers in inconsistent databases,” in Proceedings of the 37th ACM SIGMOD-SIGACTSIGAI Symposium on Principles of Database Systems, 1999, pp. 68–79.
F. Parisi, J. Grant, “On repairing and querying inconsistent probabilistic spatio-temporal databases,” International Journal of Approximate Reasoning, vol. 84, pp. 41–74, 2017.
M. Calautti, L. Libkin, A. Pieris, “An operational approach to consistent query answering,” in Proceedings of the 37th ACM SIGMOD-SIGACTSIGAI Symposium on Principles of Database Systems, 2018, pp. 239–251.
X. Wang, X. L. Dong, A. Meliou, “Data x-ray: A diagnostic tool for data errors,” in Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2015, pp. 1231–1245.
C. Berkholz, M. Merz, “Probabilistic databases under updates: Boolean query evaluation and ranked enumeration,” in Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2021, p. 402–415.
L. Antova, T. Jansen, C. Koch, D. Olteanu, “Fast and simple relational processing of uncertain data,” in In Proc. 24th IEEE International Conference on Data Engineering, 2008, pp. 983–992.
G. V. den Broeck, D. Suciu, “Query processing on probabilistic data: A survey,” Foundations and Trends in Databases, vol. 7, no. 4, pp. 197–341, 2015.
B. Kanagal, J. Li, A. Deshpande, “Sensitivity analysis and explanations for robust query evaluation in probabilistic databases,” in ACM SIGMOD International Conference on Management of Data, 2011, pp. 841–852.
C. Re, D. Suciu, “Approximate lineage for probabilistic databases,” The International Journal on Very Large Data Bases, vol. 1, no. 1, pp. 797–808, 2008.
I. Ceylan, S. Borgwardt, T. Lukasiewicz, “Most probable explanations for probabilistic database queries,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 2017, pp. 950–956.
J. Y. Halpern, J. Pearl, “Causes and explanations: A structural-model approach part i: Causes,” in Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 2001, pp. 194–202.
J. Y. Halpern, J. Pearl, “Causes and explanations: A structural-model approach. part ii: Explanations,” British Journal for the Philosophy of Science, vol. 56, no. 4, pp. 889–911, 2008.
H. Chockler, J. Y. Halpern, “Responsibility and blame: a structural model approach,” Journal of Artificial Intelligence Research (JAIR), vol. 22, no. 1, pp. 93–115, 2004.
A. Meliou, W. Gatterbauer, J. Y. Halpern, C. Koch, “Causality in databases,” IEEE Data Engineering Bulletin, vol. 33, no. 3, pp. 59–67, 2010.
L. Bertossi, B. Salimi, “Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints,” International Journal of Approximate Reasoning, vol. 90, pp. 226–252, 2017.
L. Antova, T. Jansen, C. Koch, D. Olteanu, “Fast and simple relational processing of uncertain data,” in IEEE 24th International Conference on Data Engineering, 2008, pp. 983–992.
L. Antova, C. Koch, D. Olteanu, “Maybms:managing incomplete information with probabilistic world–set decompositions,” in In Proc. 23rd IEEE International Conference on Data Engineering, 2007, pp. 1479–1480.
D. Suciu, “Probabilistic databases for all,” Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2020, p. 19–31.
IMDb.com, Inc., IMDb Website. Accessed: Oct. 19, 2022. [Online]. Available: https://www.imdb.com//
A. Meliou, W. Gatterbauer, K. Moore, D. Suciu, “The complexity of causality and responsibility for query answers and non-answers,” The International Journal on Very Large Data Bases, vol. 4, no. 1, pp. 34–45, 2010.
T. J. Green, V. Tannen, “Models for incomplete and probabilistic information,” in Proceedings of the international conference on Current Trends in Database Technology, 2006, pp. 278–296.
T. J. Green, G. Karvounarakis, V. Tannen, “Provenance semirings,” in Proceedings of the 2007 ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems, 2007, pp. 31–40.
R. Diestelkämper, S. Lee, M. Herschel, B. Glavic, To Not Miss the Forest for the Trees - A Holistic Approach for Explaining Missing Answers over Nested Data, p. 405–417. 2021.
B. Salimi, Quer-Answer Causality in Dataabses And its Connections with Reverse Reasoning Tasks in Data And Knowledge Management. PhD dissertation, Carleton University, 2015.
L. Bertossi, B. Salimi, “From causes for database queries to repairs and model-based diagnosis and back,” Theory of Computing Systems, vol. 61, no. 1, pp. 191–232, 2017.
A. Meliou, A. Meliour, S. Nath, D. Suciu, “Tracing data errors with viewconditioned causality,” in Proceedings of the 2011 ACM SIGMOD-SIGACTSIGAI Symposium on Principles of Database Systems, 2011, pp. 505–516.
X. Lian, L. Chen, “Causality and responsibility: probabilistic queries revisited in uncertain databases,” in Proceedings of the 22nd ACM international conference on Information and Knowledge Management, 2013, pp. 349–358.
K. Mu, “Responsibility for inconsistency,” International Journal of Approximate Reasoning, vol. 61, pp. 43–60, 2015.
Z. Miao, Q. Zeng, B. Glavic, S. Roy, “Going beyond provenance: Explaining query answers with pattern-based counterbalances,” in Proceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, 2019, p. 485–502.
L. Antova, C. Koch, D. Olteanu, “From complete to incomplete information and back,” in ACM SIGMOD International Conference on Management of Data, 2007, pp. 713–724.
J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Ré, D. Suciu, “Mystiq: a system for finding more answers by using probabilities,” in Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, 2005, pp. 891–893.
O. Benjelloun, A. D. Sarma, A. Halevy, J. Widom, “Databases with uncertainty and lineage,” in The International Journal on Very Large Data Bases, 2006, pp. 953–964.
O. Benjelloun, A. D. Sarma, C. Hayworth, J. Widom, “An introduction to uldbs and the trio system,” IEEE Data Engineering Bulletin, vol. 29, no. 1, pp. 5–16, 2006.
T. Imielin´ski, W. Lipski, “Incomplete information in relational databases,” Journal of the ACM (JACM), vol. 31, no. 4, pp. 761–791, 1984.
P. Sen, A. Deshpande, L. Getoor, “Read-once functions and query evaluation in probabilistic databases,” The International Journal on Very Large Data Bases, vol. 3, no. 1, pp. 1068–1079, 2010.
D. Olteanu, J. Huang, “Using obdds for efficient query evaluation on probabilistic databases,” in 2nd International Conference on Scalable Uncertainty Management, 2008, pp. 109–121.
A. Darwiche, P. Marquis, “A knowledge compilation map,” Journal of Artificial Intelligence Research, vol. 17, no. 1, pp. 229–264, 2002.
C. Re, N. Dalvi, D. Suciu, “Efficient top-k query evaluation on probabilistic data,” in 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp. 108–122.
C. Koch, D. Olteanu, “Conditioning probabilistic databases,” The International Journal on Very Large Data Bases, vol. 1, no. 1, pp. 313–325, 2008.
A. Meliou, W. Gatterbauer, K. Moore, D. Suciu, “Why so? or why no? functional causality for explaining query answers,” Department of Computer Science and Engineering, University of Washington, Seattle, 2010.
MayBMS Project, MayBMS - A Probabilistic Database Management System. Accessed: Oct. 19, 2022. [Online]. Available: http://maybms.sourceforge.net//
Downloads
Published
-
Abstract174
-
PDF26