Data-centric Languages and Systems
Overview
The data-centric languages and systems thematic aims at designing and developping programming langages as well as systems that seriously take into account massive data. The purpose is to build robust and efficient platforms on well founded theoretical grounds. Internet explosion and the ever growing importance of data in applications as well as the recent emergence of cloud computing, has given birth to a whirlwind of new data models (XML, JSON) and languages (XPath, XQuery, Pig, Jaql...). Whether they are developed under the banner of NoSQL (which stands for Not Only SQL), for BigData Analytics, for Cloud computing or as domain specific languages (DSL) embedded in a host language, most of them share a common subset of SQL and/or the ability to handle semistructured data. Such languages can greatly benefit from formal uniform foundations, and we argue that such foundations should account for novel features critical to various application domains. Also, most of those languages provide limited type checking, or ignore it altogether. We believe type checking is essential for many applications, with usage ranging from error detection to optimization.
Regarding the language topic, we are currently interested in the XML format, this led us to the design and implementation of CDuce an XML centric programming language .
The Datacert thematic is highly interested in the certification of data intensive systems with the Coq proof assistant. As data, in all forms, is increasingly large in volume, as a result of computers capturing more and more of the companies activity, the scientists' work and also as a result of better and more powerful automatic data gathering tools (e.g. space telescopes, focused crawlers, archived experimental data mandatory in some types of government-funded research programs). The availability and reliability of such large data volumes is a gold mine for companies, scientists and simply citizens. It is then of crucial importance to protect data integrity and reliability by means of robust methods and tools. Program verification and certification have been intensively studied in the last decades yielding very impressive results and highly reliable software (e.g., the Compcert project). However, and surprisingly, while the amount of data stored and managed by data engines has drastically increased, little attention has been devoted to ensure that such (complex) systems are indeed reliable. The aim of this research track is to certify data intensive systems such as relational database systems, XQuery and semi structured engines using formal tools such as SMT provers (e.g., Alt-ergo) embedded in the Why(3) platform as well as interactive theorem provers (e.g., Coq).
In general we are interested in the application of static analysis techniques (type theory, abstract interpretation...) and theorem proving to data-related problems (query optimisation, verification of integrity contraints). We also focus on adapting mainstream compilation techniques to data-centric language evaluators (just-in-time compilation of queries, evalutation of pattern-matching and path expression in the presence of XML indexes).
People
Véronique Benzaken, Évelyne Contejean, Stéfania Dumbrava (PhD), Hyeonseung Im (Post Doc), Kim Nguyen,Internship proposals
HereSoftware development
Datacert: a Coq library for the relational data model Coq Documentation [CoqDoc file]
Tree Automata Toolkit (TAToo): An XPath engine based on tree automata TaToo's development site
CDuce: an XML-centric functional programming language , part of major Linux distributions (Debian/Ubuntu, Fedora, Mandriva).
Related Grants
- Projet ANR - Blanc Typex Typeful certified XML: integrating language, logic, and data-oriented best practices.
- Projet ANR - DEFIS Codex
- Projet exploratoire RNTL GraphDuce.
- Action Spécifique du CNRS Langages, types, securite et integrite pour donnees semi-structurees
- ACI Securite Informatique : Projet CASC
- ACI Masse de Donnees Projet TRALALA
Related Publications
Books and books chapters
- XML Typechecking , V. Benzaken and G. Castagna and H. Hosoya and B-C Pierce and S. Vansummeren (Invited chapter in) Encyclopedia of Database Systems, Springer Verlag 2009.
International Journals
- The Farthest Spatial Skyline Queries Gae-won You, Mu-Woong Lee, Hyeonseung Im, and Seung-won Hwang Information Systems, 38(3):286-301, May 2013
- Optimizing XML Querying using Type-based Document Projection V. Benzaken, G. Castagna, D. Colazzo, K. Nguyen. In ACM Transactions on Database Systems (TODS) March 2013.
- XPath whole query optimization, S. Maneth and K. Nguyen. In Processing of the VLDB Endowment, Volume 3, Issue 1(Septembre 2010).
- Semantic Subtyping: dealing set-theoretically with function, union, intersection and negation types , V. Benzaken, G. Castagna and A. Frisch . In Journal of the ACM (JACM) Volume 55 , Issue 4 (September 2008).
- Static Verification of dynamical constraints a semantics-based approach V. Benzaken, S. Cerrito and S. Praud. In NIS Journal 2000.
- Themis: a Database Programming Language Handling Integrity Constraints , V.Benzaken and A. Doucet. In the VLDB Journal, Volume 4, Number 3, July-August 1995.
International Conferences
- A Coq formalization of the Relational Data Model V. Benzaken, E. Contejean, S. Dumbrava. in 23rd European Symposium on Programming Languages ESOP Grenoble 2014
- Polymorphic Functions with Set-Theoretic Types. Part 1: Syntax, Semantics, and Evaluation. G. Castagna, K. Nguyen, Z. Xu, H. Im, S. Lenglet, and L. Padovani In POPL'14, 41th ACM Symposium on Principles of Programming Languages, January, 2014.
- Contractive Signatures with Recursive Types, Type Parameters, and Abstract Types Hyeonseung Im, Keiko Nakata, and Sungwoo Park In Proceedings of the 40th International Colloquium on Automata, Language and Programming (ICALP 2013), July 2013
- Static and dynamic semantics for NoSQL Languages V. Benzaken, G. Castagna, K. Nguyen, J. Simeon. in ACM International Conference on Principles of Programming Languages POPL Roma 2013
- Fast in-memory XPath search using compressed indexes, D. Arroyuelo, F. Claude, S. Maneth, Veli Mäkinen, G. Navarro, K. Nguyen, J. Sirén, N. Välimäki, 26th International Conference on Data Engineering, ICDE 2010
- Typed iterators for XML, G. Castagna and K. Nguyen, 13th ACM SIGPLAN International Conference on Functional Programming, ICFP 2008
- Pattern by Example: type-driven visual programming of XML queries V.Benzaken, G. Castagna, D. Colazzo and C. Miachon 10th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming PPDP 2008
- Type-Based XML Projection, V.Benzaken, G. Castagna, D. Colazzo and K. Nguyen, In 32nd International Conference on Very Large Databases VLDB 2006
- A Full Pattern-based Paradigm for XML Query Processing V.Benzaken,G.Castagna and C.Miachon, In the Seventh International Symposium on Practical Aspects of Declarative Languages PADL 2005
- Information Flow Security for XML Transformations, V. Benzaken, M. Burelle and G. Castagna, in Proc of the Eighth Asian Computing Science Conference (ASIAN'03) Collocated with ICLP'03 and FSTTCS'03, Lecture Notes in Computer Science, Springer-Verlag, Tata Institute of Fundamental Research, Mumbai, India, December 10-13, 2003.
- CDuce: an XML-centric general purpose programming language, V. Benzaken, G. Castagna and A. Frisch in Proc of the international conference on functional programming (ICFP'03), Upsala, Sweden, 25-29 august 2003.
- Semantic subtyping, A. Frisch, G. Castagna and V. Benzaken in Proc of the Seventeenth Annual IEEE Symposium on Logic in Computer Science (LICS'2002), pages 137-146, Copenhagen, Denmark,22th-25th July.
- Static management of integrity constraint in object-oriented database systems: design and implementation. V. Benzaken and X. Schaefer, In G. Alonso and H-J Schek, editors, Proc. of the 6th International Conference on Extending Database Technology (EDBT'98), Lecture Notes in Computer Science, Valencia, Spain, March 1998. Springer-Verlag.
- Static integrity constraint management in object-oriented database programming languages via predicate transformers. V. Benzaken and X. Schaefer. In M. Aksit and B. Magnusson, editors, Proc. of the 11th European Conference on Object-Oriented Programming (ECOOP'97), Lecture Notes in Computer Science, Jyvaskyla Finland, June 1997. Springer-Verlag.