Learning in Non-(geo)metric Spaces

Learning in Non-(geo)metric Spaces - Workshop @ ICML 2010

Motivation and Goal

Traditional pattern recognition techniques are intimately linked to the notion of "feature spaces." Adopting this view, each object is described in terms of a vector of numerical attributes and is therefore mapped to a point in a Euclidean (geometric) vector space so that the distances between the points reflect the observed (dis)similarities between the respective objects. This kind of representation is attractive because geometric spaces offer powerful analytical as well as computational tools that are simply not available in other representations. Indeed, classical pattern recognition methods are tightly related to geometrical concepts and numerous powerful tools have been developed during the last few decades, starting from the maximal likelihood method in the 1920's, to perceptrons in the 1960's, to kernel machines in the 1990's.

However, the geometric approach suffers from a major intrinsic limitation, which concerns the representational power of vectorial, feature-based descriptions. In fact, there are numerous application domains where either it is not possible to find satisfactory features or they are inefficient for learning purposes. This modeling difficulty typically occurs in cases when experts cannot define features in a straightforward way (e.g., protein descriptors vs. alignments), when data are high dimensional (e.g., images), when features consist of both numerical and categorical variables (e.g., person data, like weight, sex, eye color, etc.), and in the presence of missing or inhomogeneous data. But, probably, this situation arises most commonly when objects are described in terms of structural properties, such as parts and relations between parts, as is the case in shape recognition.

In the last few years, interest around purely similarity-based techniques has grown considerably. For example, within the supervised learning paradigm (where expert-labeled training data is assumed to be available) the now famous "kernel trick" shifts the focus from the choice of an appropriate set of features to the choice of a suitable kernel, which is related to object similarities. However, this shift of focus is only partial, as the classical interpretation of the notion of a kernel is that it provides an implicit transformation of the feature space rather than a purely similarity-based representation. Similarly, in the unsupervised domain, there has been an increasing interest around pairwise or even multiway algorithms, such as spectral and graph-theoretic clustering methods, which avoid the use of features altogether.

By departing from vector-space representations one is confronted with the challenging problem of dealing with (dis)similarities that do not necessarily possess the Euclidean behavior or not even obey the requirements of a metric. The lack of the Euclidean and/or metric properties undermines the very foundations of traditional pattern recognition theories and algorithms, and poses totally new theoretical/computational questions and challenges.

The aim of this workshop is to consolidate research efforts in this area, and to provide an informal discussion forum for researchers and practitioners interested in this important yet diverse subject. The discussion will revolve around two main themes, which basically correspond to the two fundamental questions that arise when abandoning the realm of vectorial, feature-based representations, namely:

  • How can one obtain suitable similarity information from data representations that are more powerful than, or simply different from, the vectorial?
  • How can similarity information be used in order to perform learning and classification tasks?

Accordingly, topics of interest include (but are not limited to):

  • Embedding and embeddability
  • Graph spectra and spectral geometry
  • Indefinite and structural kernels
  • Characterization of non-(geo)metric behaviour
  • Foundational issues
  • Measures of (geo)metric violations
  • Learning and combining similarities
  • Multiple-instance learning
  • Applications


  • Joachim M. Buhmann, ETH Zurich, Switzerland
  • Robert P. W. Duin, Delft University of Technology, The Netherlands
  • Mario A. T. Figueiredo, Insituto Superior Tcnico, Lisbon, Portugal
  • Edwin R. Hancock, University of York, UK
  • Vittorio Murino, University of Verona, Italy
  • Marcello Pelillo, Ca' Foscari University, Venice, Italy (chair)


The workshop is planned to be a one-day meeting. The program will feature a panel discussion on the topic "Is non-(geo)metricity an issue for machine learning?," invited oral presentations, a contributed poster session, and poster spotlights. We feel that the more informal the better and we would like to solicit open and lively discussions and exchange of ideas from researchers with different backgrounds and perspectives. Plenty of time will be allocated to questions, discussions, and breaks.

Researchers who want to contribute a poster should submit a 2-page abstract of their work by email to Marcello Pelillo (pelillo@dsi.unive.it), by May 16, 2010.

The organizers will review all submissions. Notification of acceptance will be sent out by June 6, 2010.

We plan to run a special issue devoted to the workshop's topic in a major machine learning journal soon after the workshop.