Juan Reutter and Tony Tan

University of Edinburgh Abstract. Graph databases are directed graphs in which the edges are labeled with symbols from a ﬁnite alphabet. In this paper we introduce a logic for such graphs in which the domain is the set of edges. We compare its expressiveness with the standard logic in which the domain the setof vertices. Furthermore, we introduce a robust model of computation for such logic, the so called graph pebble automata.

1 Introduction

The study of graph structured data has received much attention lately, due to numerous applications in areas such as biological networks [12, 15], social networks [17], and the semantic Web [9]. The common database model proposed by such applications is mostcommonly denoted as graph databases, in which nodes are objects, and edge labels deﬁne relationships between those objects [2]. For querying graph structured data, one normally wishes to specify certain types of paths between nodes. Most common examples of these queries are conjunctive regular path queries [1, 14, 6, 3]. Those querying formalisms have been thoroughly studied, and theiralgorithmic properties are more or less understood. On the other hand, there has been much less work devoted on other formalisms other than graph reachability patterns, say, for example, the integrity constraints such as labels with unique names, typing constraints on nodes, functional dependencies, domain and range of properties. See, for instance, the survey [2] for more examples of integrity constraints.Our intention is to study formalisms for such graph databases which is capable of expressing these integrity constraints, while at the same time still feature manageable model checking properties. Obviously such formalisms depend on how the underlying directed graph of the databases are represented in the ﬁrst place. The standard representation of directed graphs is simply a set of nodes,together with a binary relation on these nodes to represent the edge among them. The labeling of the edges is represented as a function from the edges to the ﬁnite alphabet of symbols. We call such representation the vertex representation. Another less common way to represent directed graphs is to take the edges as the domain, together with some well deﬁned binary relations on these edges to indicate howtwo edges intersect. The labeling of the edges is represented as a set of unary predicates on the domain. We call such representation the edge representation. In the ﬁrst part of our paper we propose a vocabulary for the edge representation, which we call E-vocabulary. We call V-vocabulary the vocabulary for the vertex representation. We study the expressive power of E and compare it toV-vocabulary. In this respect our contributions are the following.

– The logic that we propose for edge representation is robust, in the sense that for each graph database in the vertex representation, there exists a unique (up to isomorphism) graph database in the edge representation that have the same underlying directed graph. Vice versa, for each graph database in the edge representation, thereexists a unique (up to isomorphism) graph database in the vertex representation that have the same underlying directed graph. – Next, we turn our attention to expressivity of E-vocabulary. For ﬁrst-order logic (FO) we show that it is equivalent to V-vocabulary. On the other hand, for the existential monadic second-order logic (∃MSO), as well as monadic second-order logic (MSO), the E-vocabulary ismore expressive than the V-vocabulary. That is, there are ∃MSO and MSO sentences in E-vocabulary that cannot be expressed in sentences in V-vocabulary in ∃MSO and MSO logics, respectively. In the second part of our paper we introduce a notion of automata for graph databases. We follows the direction in [19] by deﬁning pebble automata for directed graphs. Pebble automata was initially introduced...