

In p2p based data mangement applications, it is unrealistic to rely upon a centralized schema or ontology. The p2p paradigm is more than a new underlying infrastructure. It supports an emergent approach to data management where the data is generated and inserted into the network in a decentralized fashion. Thus, each peer or group of peers will have its own schema to store the data. Moreover, the user querying the data will use yet another schema to formulate the request. The vision of emergent schema management is to resolve these heterogeneities automatically in a self-organizing, emergent way by taking advantage of overlaps and mediators scattered over the network. The emerging schema information can be used in various ways, i.e. to drive the construction of an overlay network, and to route queries through the network.
In this article, we start by explaining the various challenges. We look at the problem both from the viewpoint of the database community describing schemas as entity-relationship models, and from the viewpoint of the knowledge representation community using logic-based formalisms. We then survey existing p2p based approaches dealing with semantics, schemas, and mediation. After describing our own approach to p2p schema management, we conclude with an outlook to open problems in the field.