| 09:00 - 09:10 |
Introduction and overview of the SIMBAD project
Marcello Pelillo (University of Venice)
|
| 09:10 - 09:40 |
Clustering without any subjective similarity information
Shai Ben-David (University of Waterloo)
Abstract:
Consider the task of clustering university web pages based on the graph of links between these pages.
Can clusters of "functionally similar" pages be detected from just this link structure? Note that this
is a clustering task in which one starts without any prior knowledge of any similarity or distance
measure between the domain elements. All the information in the input comes as objective, observed,
binary relations among the objects. These relations are not similarity links. For example, the cluster
of professors pages have very internal links, whereas the cluster of service pages have lots of internal
links. What we are looking for are clusters whose members share similar link patterns with respect to
the other clusters.
We propose a formal model for such clustering tasks. Our model is based on an objective function that
measures the homogeneity of between-clusters links. I shall discuss the computational complexity of
finding a clustering with minimal objective cost and describe some hardness results as well as efficient
approximation algorithms.
The talk is (partly) based on work with Sharon Wulff.
|
| 09:40 - 10:10 |
Learning with similarity functions
Maria-Florina Balcan (Georgia Tech)
Abstract:
Kernel functions have become an extremely popular tool in machine learning, with many applications
and an attractive theory. This theory views a kernel as performing an implicit mapping of data points
into a possibly very high dimensional space, and describes a kernel function as being good for a
given learning problem if data is separable by a large margin in that implicit space.
In this talk I will describe an alternative, more general, theory of learning with similarity functions (i.e.,
sufficient conditions for a similarity function to allow one to learn well) that does not require reference to
implicit spaces, and does not require the function to be positive semi-definite (or even symmetric).
In particular, I will describe a notion of a good similarity function for a given learning problem that (a) is
fairly natural and intuitive (it does not require an implicit space and allows for functions that are not
positive semi-definite), (b) is a sufficient condition for learning well, and (c) strictly generalizes the
notion of a large-margin kernel function in that any such kernel is also a good similarity function,
though not necessarily vice-versa.
|
| 10:10 - 10:40 |
On probabilistic hypergraph matching
Amnon Shashua (The Hebrew University of Jerusalem)
Abstract:
We consider the problem of finding a matching between
two sets of features, given complex relations among them,
going beyond pairwise. We derive the hyper-graph matching problem in a probabilistic setting represented by a convex optimization. First, we formalize a soft matching criterion that emerges from a probabilistic interpretation of the problem input and output, as opposed to previous methods that treat soft matching as a mere relaxation of the hard matching problem.
Second, the model induces an algebraic relation between the hyper-edge weight matrix and the desired vertex-to-vertex probabilistic matching. Third, the model explains some of the graph matching normalization proposed in the past on a heuristic basis such as doubly stochastic normalizations of the edge weights. A key benefit of the model is that the global optimum of the matching criteria can be found via an iterative successive projection algorithm. The algorithm reduces to the well known Sinkhorn row/column matrix normalization procedure in the special case when the two graphs have the same number of vertices and a complete matching is desired. Another benefit of our model is the straightforward scalability from graphs to hyper-graphs.
The work was done with Ron Zass (and made its debut in CVPR 2008)
|
| 10:40 - 11:00 |
Coffee break
|
| 11:00 - 11:30 |
From collaborative filtering to multitask learning
Alex Smola (Yahoo! Research)
Abstract:
Recent work on collaborative filtering has led to a large number of both scalable and theoretically well
founded algorithms. In this paper, we show that collaborative filtering and multitask learning are innately
closely connected. In particular, the 'learning the kernel' paradigm in multitask learning turns out to be
identical to a Ky-Fan norm minimization. This allows us to “import” collaborative filtering techniques into
multitask learning and vice versa; in particular, we solve a multitask learning problem where the tasks also
have features. We show the feasibility of our approach on two large real-world multitask learning
applications.
Joint work with Markus Weimer, Wei Chu, Deepayan Chakrabarti.
|
| 11:30 - 12:00 |
A metric notion of dimension and its applications to learning
Robert Krauthgamer (The Weizmann Institute of Science)
Abstract:
Let us define the dimension of a metric space as the minimum k>0 such
that every ball in the metric space can be covered by 2^k balls of
half the radius. This definition has several attractive features
besides being applicable to every metric space. For instance, it
coincides with the standard notion of dimension in Euclidean spaces,
but captures also nonlinear structures such as manifolds.
Metric spaces of low dimension (under the above definition) occur
naturally in many contexts. I will discuss recent theoretical results
regarding such metric spaces, including questions such as
embeddability, dimension reduction, Nearest Neighbor Search, and
large-margin classification, the common thread being that low
dimension implies algorithmic efficiency.
|
| 12:00 - 14:00 |
Lunch
|
| 14:00 - 14:30 |
Scalable algorithms for learning on graphs
Nicolò Cesa-Bianchi (University of Milan)
Abstract:
Networked data are found in a variety of domains: Web, social networks, biological
networks, and many others. In learning tasks, networked data are often represented
as a weighted graph whose edge weights reflect the similarity between incident nodes.
In this talk, we consider the problem of classifying the nodes of an arbitrary given
graph in the game-theoretic mistake bound model. We characterize the optimal predictive
performance in terms of the cutsize of the graph's random spanning tree, and describe
a randomized prediction algorithm achieving the optimal performance while running in
expected time sublinear in the graph size (on most graphs).
These results are then extended to the active learning model, where training labels
are obtained by querying nodes selected by the algorithm. We describe a fast query
placement strategy that, in the special case of trees, achieves the optimal number
of mistakes when classifying the non-queried nodes.
Joint work with: Claudio Gentile, Fabio Vitale and Giovanni Zappella.
|
| 14:30 - 15:00 |
A note of caution regarding distances on graphs
Ulrike von Luxburg (Max Planck Institute for Biological Cybernetics)
Abstract:
Non-geometric data is often represented in form of a graph where edges represent
similarity or local relationships between instances. One elegant way to exploit
the global structure of the graph is implemented by the commute distance (also
known as resistance distance). Supposedly it has the property that vertices from
the same cluster are "close" to each other whereas vertices from different
clusters are "far" from each other. We study the behavior of the commute distance
as the size of the underlying graph increases. We prove that the commute distance
converges to an expression that does not take into account the structure of the
graph and that is completely meaningless as a distance function on the graph.
Consequently, the use of the raw commute distance for machine learning purposes
is strongly discouraged for large graphs and in high dimensions. We suspect that
a similar behavior holds for several other distances on graphs.
|
| 15:00 - 16:00 |
Poster session (includes refreshment)
|
|
Dissimilarity-based representation for local parts
A. Carli, U. Castellani, M. Bicego, and V. Murino (University of Verona, IIT Genova)
|
|
Learning near-isometric matching models
Julian J. McAuley and Tiberio Caetano (NICTA)
|
|
Measuring similarity non-metrically
Michael H. Coen, Nathanael Fillmore, and M. Hidayath Ansari (University of Wisconsin-Madison)
|
|
The earth mover’s distance -- Beyond nearest neighbor classification?
Ofir Pele and Michael Werman (The Hebrew University of Jerusalem)
|
|
Geometric embedding for learning combinatorial structures
Terran Lane, Ben Yackley, Sergey Plis, and Blake Anderson (University of New Mexico)
|
|
Spherical embedding, Ricci flow and Non-metric similarity data
Edwin R. Hancock et al (University of York, UK)
|
|
Clustering as a non-cooperative game
M. Pelillo and S. Rota Bulò (University of Venice)
|
|
Extending a distance metric learning approach to cover non-geometric spaces
Aparna S. Varde and Jianyu Liang ( Montclair State University, Worcester Polytechnic Institute, USA)
|
|
Computational TMA Analysis and Cell Nucleus
Classification of Renal Cell Carcinoma
Peter J. Schueffler, Thomas J. Fuchs, Cheng Soon Ong, Joachim M. Buhmann
(ETH Zurich)
|
| 16:00 - 17:00 |
Panel discussion:
Is non-(geo)metricity an issue for machine learning ?
Panelists:
- Shai Ben-David (University of Waterloo)
- Joachim Buhmann (ETH Zurich)
- Edwin Hancock (University of York, UK)
- Alex Smola (Yahoo! Research)
Moderator: Marcello Pelillo (University of Venice)
|