Friday, October 24, 2014

Book: "Foundations of Data Science" by John Hopcroft and Ravindran Kannan

I just found out about a draft version of "Foundations of Data Science" a new book by John Hopcroft and Ravindran Kannan and very much like chapter 7 through 10 that touches upon themes we talk about often here on Nuit Blanche: Compressive Sensing ( see also The Big Picture in Compressive Sensing) , Advanced Matrix Factorization (see also the Advanced Matrix Factorization Jungle Page), complexity, streaming/sketching issues and Randomized Numerical Linear Algebra, Machine Learning and more. Let us note that compressive sensing is introduced at the very end, much like the other foundation book on signal processing this week (Books: "Foundations of Signal Processing" and "Fourier and Wavelet Signal Processing"). One wonders if a certain clarity would come earlier if the subject was introduced first so that titles such as Compressive Sensing Demystified ( by Frank Bucholtz and Jonathan M. Nichols) would not be needed. Here is the table of content for the last chapters.
7 Algorithms for Massive Data Problems 238
7.1 Frequency Moments of Data Streams . . . . . . . . . . . . . . . . . . . . . 238
7.1.1 Number of Distinct Elements in a Data Stream . . . . . . . . . . . 239
7.1.2 Counting the Number of Occurrences of a Given Element. . . . . . 243
7.1.3 Counting Frequent Elements . . . . . . . . . . . . . . . . . . . . . . 243
7.1.4 The Second Moment . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.2 Matrix Algorithms Using Sampling . . . . . . . . . . . . . . . . . . . . . . 248
7.2.1 Matrix Multiplication Using Sampling . . . . . . . . . . . . . . . . 248
7.2.2 Sketch of a Large Matrix . . . . . . . . . . . . . . . . . . . . . . . . 250
7.3 Sketches of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8 Clustering 260
8.1 Some Clustering Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.2 A k-means Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 263
8.3 A Greedy Algorithm for k-Center Criterion Clustering . . . . . . . . . . . 265
8.4 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.5 Recursive Clustering Based on Sparse Cuts . . . . . . . . . . . . . . . . . . 273
8.6 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.7 Agglomerative Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
8.8 Dense Submatrices and Communities . . . . . . . . . . . . . . . . . . . . . 278
8.9 Flow Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.10 Finding a Local Cluster Without Examining the Whole Graph . . . . . . . 284
8.11 Axioms for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
8.11.1 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . 289
8.11.2 A Satis able Set of Axioms . . . . . . . . . . . . . . . . . . . . . . 295
8.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9 Topic Models, Hidden Markov Process, Graphical Models, and Belief Propagation 301
9.1 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
9.2 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
9.3 Graphical Models, and Belief Propagation . . . . . . . . . . . . . . . . . . 310
9.4 Bayesian or Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9.5 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
9.6 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
9.7 Tree Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
9.8 Message Passing in general Graphs . . . . . . . . . . . . . . . . . . . . . . 315
9.9 Graphs with a Single Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.10 Belief Update in Networks with a Single Loop . . . . . . . . . . . . . . . . 319
9.11 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 320
9.12 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
9.13 Correlation Between Variables . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

10 Other Topics 332
10.1 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .332
10.2 Hare System for Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334
10.3 Compressed Sensing and Sparse Vectors . . . . . . . . . . . . . . . . . . .335
10.3.1 Unique Reconstruction of a Sparse Vector . . . . . . . . . . . . . .336
10.3.2 The Exact Reconstruction Property . . . . . . . . . . . . . . . . . .339
10.3.3 Restricted Isometry Property . . . . . . . . . . . . . . . . . . . . .340
10.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
10.4.1 Sparse Vector in Some Coordinate Basis . . . . . . . . . . . . . . .342
10.4.2 A Representation Cannot be Sparse in Both Time and Frequency Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342
10.4.3 Biological . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .345
10.4.4 Finding Overlapping Cliques or Communities . . . . . . . . . . . .345
10.4.5 Low Rank Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .346
10.5 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .347
10.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .348
10.6.1 The Ellipsoid Algorithm . . . . . . . . . . . . . . . . . . . . . . . .350
10.7 Integer Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
10.8 Semi-De nite Programming . . . . . . . . . . . . . . . . . . . . . . . . . .352
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356



 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

2 comments:

Anonymous said...

Should the link to the 'draft version' work?

Igor said...

Yes, it should.

Printfriendly