Balazs Feil and Janos Abonyi and Ferenc Szeifert
University of Veszprem, Department of Process Engineering, Veszprem, P.O. Box 158, H-8201, Hungary email@example.com, http://www.fmt.vein.hu/softcomp
Abstract Selecting the order of an input-output model of a dynamical system is a key step toward the goalof system identiﬁcation. The false nearest neighbors algorithm (FNN) is a useful tool for the estimation of the order of linear and nonlinear systems. While advanced FNN uses nonlinear input-output data based models for the model-based selection of the threshold constant that is used to compute the percentage of false neighbors, the computational eﬀort of the method increases along with the numberof data and the dimension of the model. To increase the eﬃciency of this method, in this paper we propose a clustering-based algorithm. Clustering is applied to the product space of the input and output variables. The model structure is then estimated on the basis of the cluster covariance matrix eigenvalues. The main advantage of the proposed solution is that it is model-free. This means that noparticular model needs to be constructed in order to select the order of the model, while most other techniques are ‘wrapped’ around a particular model construction method. This saves the computational eﬀort and avoids a possible bias due to the particular construc-
Preprint submitted to Elsevier Science
3 December 2003
tion method used. Three simulation examples are given to illustratethe proposed technique: estimation of the model structure for a linear system, a polymerization reactor and the van der Vusse reactor. Key words: system identiﬁcation, model order selection, false-nearest neighbors, fuzzy clustering, Minimum Description Length (MDL)
Most data-driven identiﬁcation algorithms assume that the model structure is a priori known or that it isselected by a higher-level ‘wrapper’ structureselection algorithm. Several information-theoretic criteria have been proposed for structure selection in linear dynamic input–output models. Examples of the classical criteria are the Final Prediction-Error (FPE) and the Akaike Information Criterion (AIC) . Later, the Minimum Description Length (MDL) criterion developed by Schwartz and Rissanen wasproven to produce consistent estimates of the structure of linear dynamic models . With these tools, determining the structure of linear systems is a rather straightforward task. However, relatively little research has been done into the structure selection for nonlinear models. In the paper of Aguirre and Billings , the concepts of term clusters and cluster coeﬃcients are deﬁned and usedin the context of system identiﬁcation. It is argued that if a certain type of term in a nonlinear model is spurious, the respective cluster coeﬃcient is small compared with the coeﬃcients of the other clusters represented in the model. In , this approach is used to the structure selection of polynomial models. In  an alternative solution to the model structure selection problem isintroduced by conducting a forward search through the many possible candidate model 2
terms initially and then performing an exhaustive all subset model selection on the resulting model. A backward search approach based on orthogonal parameter-estimation is also applied [17,1]. As can be seen, these techniques are ‘wrapped’ around a particular model construction method. Hence, the result of theestimate can be biased due to the particular construction method used. To avoid this problem in this paper a ‘model free’ approach is followed where no particular model needs to be constructed in order to select the order of the model. The advantage of this approach is that this estimate is based on geometrical/embedding procedures and does not depend on the model representation that will be used a...