# Coresets for Clustering in Graphs of Bounded Treewidth

@inproceedings{Braverman2020CoresetsFC, title={Coresets for Clustering in Graphs of Bounded Treewidth}, author={Vladimir Braverman and Lingxiao Huang and Shaofeng H.-C. Jiang and Robert Krauthgamer and Xuan-Wei Wu}, booktitle={ICML}, year={2020} }

We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems (on graph metrics) are essential to data analysis and used for example in road networks and data visualization. Specifically, we consider $(k, z)$-Clustering, where given a metric space $(V, d)$, the goal is to minimize, over all $k$-point center sets $C$, the objective $\sum_{x \in V}{d^z(x, C)}$. This problem is a well-known generalization of both… Expand

#### Topics from this paper

#### 7 Citations

A new coreset framework for clustering

- Computer Science
- STOC
- 2021

A new, simple coreset framework is presented that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases. Expand

On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications

- Computer Science
- ICALP
- 2021

The new coreset construction scheme is fairly general and gives rise to coresets for a wide range of constrained clustering problems, which leads to improved constant-approximations for these problems in general metrics and near-linear time $(1+\epsilon)$-app approximations in the Euclidean metric. Expand

Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal

- Mathematics, Computer Science
- STOC
- 2020

A unified two-stage importance sampling framework that constructs an ε-coreset for the (k,z)-clustering problem and relies on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. Expand

Coresets for k-median clustering under Fréchet and Hausdorff distances

- Computer Science
- ArXiv
- 2021

The authors' is the first such result, where the size of the coreset is independent of the number of input curves/point sets to be clustered (although it still depends on the maximum complexity of each input object). Expand

One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning

- Computer Science
- ArXiv
- 2021

A novel optimization framework is proposed to analyze this problem and an efficient approximation algorithm under the framework of Mini-batch gradient descent as a practical solution is provided. Expand

Coresets for Clustering in Excluded-minor Graphs and Beyond

- Computer Science, Mathematics
- SODA
- 2021

This work obtains an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). Expand

Coresets for Clustering with Missing Values

- Computer Science, Mathematics
- ArXiv
- 2021

The first coreset for clustering points in R that have multiple missing values (coordinates) is provided, which exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniformsampling baseline. Expand

#### References

SHOWING 1-10 OF 69 REFERENCES

Coresets for Ordered Weighted Clustering

- Mathematics, Computer Science
- ICML
- 2019

The main result is a construction of a simultaneous coreset of size $O_{\epsilon, d}(k^2 \log^2 |X|)$ for Ordered k-Median, which translates to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. Expand

On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications

- Mathematics, Computer Science
- SIAM J. Comput.
- 2009

These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space. Expand

Efficient approximation schemes for uniform-cost clustering problems in planar graphs

- Mathematics, Computer Science
- ESA
- 2019

A new construction of a "coreset for facilities" for planar graphs that in polynomial time one can compute a subset of facilities of size F_0 with a guarantee that there is a $(1+\epsilon)$-approximate solution contained in $F_0$. Expand

Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics

- Computer Science, Mathematics
- 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)
- 2018

The first relation between the doubling dimension of M(X, d) and the shattering dimension of the range space induced by the distance d is established and an upper bound of O(ddim(M)⋅ log(1/ε) +log log 1/τ) is proved for the probabilistic shattering dimension for even weighted doubling metrics. Expand

Quick k-Median, k-Center, and Facility Location for Sparse Graphs

- Computer Science, Mathematics
- SIAM J. Comput.
- 2004

We present $\tilde{O}(m)$ time and space constant factor approximation algorithms for the k-median, k-center, and facility location problems with assignment costs being shortest path distances in a… Expand

VC-dimension and Erdős-Pósa property

- Mathematics, Computer Science
- Discret. Math.
- 2015

The goal is to generalize the proof of Chepoi et?al. (2007) with the unique assumption of bounded distance VC-dimension of neighborhoods, in other words, the set of balls of fixed radius in a graph with bounded distanceVC-dimension has the Erd?s-Posa property. Expand

On coresets for k-means and k-median clustering

- Mathematics, Computer Science
- STOC '04
- 2004

This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median. Expand

k-Means Clustering of Lines for Big Data

- Computer Science, Mathematics
- NeurIPS
- 2019

It is proved that there is always a weighted subset (called coreset) of $dk^{O(k)}\log (n)/\epsilon^2$ lines in $L$ that approximates the sum of squared distances from £L to any given set of $k$ points, which implies results for a streaming set of lines to machines in one pass. Expand

Coresets for Clustering with Fairness Constraints

- Computer Science, Mathematics
- NeurIPS
- 2019

An approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable and achieves a speed-up to recent fair clustering algorithms by incorporating the first known coreset construction for theFair clustering problem with thek-median objective. Expand

Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms

- Computer Science, Mathematics
- Algorithmica
- 2000

Abstract. We consider the problem of preprocessing an n -vertex digraph with real edge weights so that subsequent queries for the shortest path or distance between any two vertices can be efficiently… Expand