Jan. 13 3:30pm, Wean 8220

Charalampos Tsourakakis, CMU

Two 25-Minute Talks

Charalampos Tsourakakis, CMU

Two 25-Minute Talks

Efficient Triangle Counting in Large Graphs via Degree Based Vertex Partitioning

Abstract:

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in Alon, Yuster and Zwick treating each set appropriately. We obtain a running time $O \left( m + \frac{m^{3/2} \Delta \log{n} }{t \epsilon^2} \right)$ and an $\epsilon$ approximation (multiplicative error), where $n$ is the number of vertices, $m$ the number of edges and $\Delta$ the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage $O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)$ and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results (e.g., for the Orkut graph with ~120M edges, and ~286M triangles our method runs in ~5sec with 98% accuracy). Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.

Joint work with Mihail Kolountzakis, Gary Miller and Richard Peng

Approximate Dynamic Programming Using Halfspace Queries and Multiscale Monge Decomposition and Denoising aCGH data

Abstract:

Dynamic Programming, since its introduction by Richard Bellman in the 1940s, is one of the most important problem solving techniques, with numerous applications in operations research, databases (histogram construction) times series analysis, speech recognition, robotics, biology and in many other fields. In this talk we will present two new techniques for performing dynamic programming approximately, for a recurrence not treated efficiently by existing methods. The basis of our first algorithm is the definition of a constant-shifted variant of the objective function that can be efficiently approximated using state of the art methods for range searching. Our technique approximates the optimal value of our objective function within additive $\epsilon$ error and runs in $\tilde{O}(n^{4/3+\delta} \log{ (\frac{U}{\epsilon}) )}$ time, where $\delta$ is an arbitrarily small positive constant and $U = \max \{ \sqrt{C},(P_i)_{i=1,\ldots,n} \}$. The second algorithm we provide solves a similar recurrence that's within a multiplicative factor of (1+$\epsilon$) and runs in $O(n \log{n} / \epsilon )$. The new technique introduced by our algorithm is the decomposition of the initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques. Finally, we demonstrate a biological application of our recurrence where we obtain results superior to leading competitors both on benchmarks and real data.

Joint work with Richard Peng, David Tolliver, Maria Tsiarli, Stanley Shackney, Gary Miller and Russell Schwartz