Nuit Blanche: 07/01/2007

Friday, July 20, 2007

In a previous entry, I mentioned the connection between how L1 and maximum entropy considerations. As we now know, in Compressed Sensing, the reconstruction of the signal from incoherent measurements involves a Linear Programming (L1) step featured in the Basis Pursuit approach. We also know that there is a bayesian approach to the reconstruction which obviously involves Laplace priors (because a maxent approach to the problem involving the L1 norm point to Laplace distribution as ideal priors).

In The Price of Privacy and the Limits of LP Decoding by Cynthia Dwork, Frank McSherry and Kunal Talwar show the following:

Our principal result is the discovery of a sharp threshhold ρ∗ ≈ 0.239, so that if ρ < ρ∗ and A is a random m × n encoding matrix of independently chosen standard Gaussians, where m = O(n), then with overwhelming probability over choice of A, for all x ∈ Rn, LP decoding corrects ρm arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ∗.

and then in the conclusion

Comparing the results of section 7 with those of section 4, we note that while near-perfect decoding is information theoretically possible for error rates up to a half, LP decoding fails at much lower error rates. It is thus natural to look for other efficient decoding procedures for the case of adversarial error.

We already somehow knew this from previous investigations (David Donoho and Victoria Stodden in Breakdown Point of Model When the Number of Variables Exceeds the Observations ) but we now have a number: 24 percent. Interestingly, Rick Chartrand in Exact reconstruction of sparse signals via nonconvex minimization makes the case that since L1 might not be optimal and that looking at Lp with p less than 1 might lead to better results

. The idea being that asymptotically, we want p to go to zero in order to fulfill the real L0 requirement. From the abstract:

We show that by replacing the L1 norm with the Lp norm with p < 1, exact reconstruction is possible with substantially fewer measurements. We give a theorem in this direction, and many numerical examples, both in one complex dimension, and larger scale examples in two real dimensions.

Another paper by Chartrand, Nonconvex compressed sensing and error correction shows similar results. His computations seem to go very fast compared to BP/LP so this is noteworthy (we could always go for L0 directly but since it is combinatorial, the time spent on the computation is just too enormous). In light of the reconstruction error shown in that article, one cannot but recall the statement made by Aleks Jakulin and Andrew Gelman in this comments section on using log(1+d^2) norm.

As an log(1+d^2) ball does not have (thank you Yaroslav) the same shape as the Lp norm ball for p less than 1, how can we reconcile the findings that Aleks seems to find Cauchy/t-distributions are doing a good job in sparse decomposition (better than L-1)?

For other reasons, I'd like to think that Cauchy distributions are indeed grounded in serious theoretical work (besides the observation that they are not sensitive to outliers). We already know that random projections exist when modeling the primary visual cortex. We may eventually figure that some type of Autism is related to an equivalent phase change between L0 and L1 or L_log(1+d^2). There is a nice parallel between the metabolic constraints on the cortex when doing sparse coding and CPU requirements to do L0 or L1 or Lp.

Resources:
[1] Loss function semantics.
[2] Rice Compressed sensing page.
[3] Object recognition in the cortex

Wednesday, July 18, 2007

SPGL1: a solver for large-scale sparse reconstruction problems

Make that six codes (matlab) enabling reconstruction of sparse signals in Compressed Sensing. Ewout van den Berg and Michael Friedlander just made available SPGL1 : A solver for large-scale sparse reconstruction

which can solve both the Basis Pursuit and the Lasso problem.

SPGL1 is a solver for large-scale sparse reconstruction problems. For a given noise-level it can solve

(BP)minimizex1subject toAx–b2

Alternatively, it can solve the underdetermined Lasso problem

(Lasso)minimizeAx–b2subject tox1

for a given . SPGL1 relies only on matrix-vector operations Ax and ATy and accepts both explicit matrices, and functions that evaluate these products. In addition, SPGL1 supports the complex-variables case, and solves the true complex one-norm regularized problem.

The support for complex variables is handy in the case of noiselets.

Resource: Rice Compressed Sensing Library.

Tuesday, July 17, 2007

How does the Rice one pixel camera work ?

I have been asked this question several times by colleagues and friends, so I decided to use my talents in Microsoft Paint to try to provide an explanation with some beautifully handcrafted pictures.

The problem to solve is the following: Let's say you have only one sensitive pixel/photodetector/radiation detector/Teraherz detector at your disposal but you need to take a 10 Megapixel image like the one you can get from a Best Buy or an Amazon point-and-shoot cameras

how would you go about it ? There are many ways to do this but let us also imagine that you can also put your hands on a DMD chip that is made of 10 million oscillating mirrors (an example include the famous Texas Instrument DMD) like the ones you can find in digital projectors and you can command the action of each and everyone of these tiny (15 micrometer by 15 micrometer) mirrors. In other words, with a proper set-up, every milliseconds, you can decide to shine each of these mirrors on your detector .... or not. There are now two options for obtaining this 10MP image.

First option (the raster mode):

The raster mode is simple. Just shine one mirror at a time onto the detector and let all the other mirros shine elsewhere. Do this once

twice (with another mirror)

thrice (with yet another mirror)

four times (....)

five times (...)

....5 millions times (...)

until you reach the last 10 millionth mirror.

After doing all this, you now have ten million information which put together piece by piece provides you with a 10 MP image. Generally, you then use a small CPU to perform the Discrete Cosine Transform so that eventually you are now the proprietor of a JPEG image (i.e. a compressed version of this 10MP image).

Second option (the Compressive Sensing mode):

You tell the set of mirrors, on that DMD chip, to display a set of random tilings. That way, a random set of mirrors are shining the incoming light unto the detector.

You do this once with an initial random tiling and obtain your first CS measurement

then you do this again with a second random tiling, in order to obtain your second CS measurement

then you do this again with a third random tiling, this is your third CS measurement

and so on.

Compressed sensing tells you that with very high probability, you will get the same result as the raster mode (first method) above but with many fewer CS measurements than the 10 million raster mode measurements obtained in the first method. In fact, instead of taking 10 million raster mode measurements, you are likely to need only 20 percent of that in the form of CS measurements, maybe even less.

The reason this second method works stems from the idea that most natural images are sparse in bases ranging from cosines, wavelets to curvelets (this is also why JPEG does a tremendous job in decreasing the size of most images). Functions that represent random tilings of reflective and non reflective mirrors (0s and 1s) are said to be mathematically "incoherent" with these bases thereby allowing an automatic compression at the detector level (here in the second mode, there is no need for compression with JPEG at the very end since the CS measurements are already compressed version of the image). A computational steps is required to obtain a human viewable image from these CS measurements. That step uses these solvers.

What are the pros and cons of the second option compared to the first one ?
Pros:

The overall sensor requires very low power because there is no CPU/GPU/FPGA trying to perform the compression stage at the very end (JPEG).
The sensor is dedicated to acquiring information. Information processing can be done somewhere else ( think on some other planet)
Compared to raw images (raster mode output), the information to be transmitted is very much compressed albeit not optimally (compared to JPEG).
Instead you can spend all your money designing the very best sensitive pixel you want, it may even act as a spectrometer (looking at many different energy bands), radiation detector and so forth.
Last but not least, the amount of light that goes to the detector is about half of the 10 million mirrors, which is quite high compared to the single mirror exposure in the raster mode (first method). In other words, the signal to noise ratio is pretty high (a good thing) in the CS mode as opposed to the raster mode.

Cons:

Faint signals could be submerged in the CS measurements. You first need to have a high signal to noise ratio signal to detect small signals.
Your sensor has to have a much larger dynamic range than in the raster mode in order for the A/D quantization algorithm to not mess with the CS measurements.
The number of CS measurements will be higher than the number of bits required to store the JPEG of the raster mode (about 4 to 5 times higher). Then again (and this is a pro-argument, the transformation from raster mode to JPEG is the power hungry stage of your camera and the reason you always need to recharge it, memory on the other hand is cheap.)

Terry Tao provides a much clearer mathematical explanation albeit with no beautifully hand crafted images such as the ones found here. All information about this single pixel camera system can be found at Rice University.

[ Update 2013: Inview Corporation develops single pixel cameras using the intellectual property from Rice University ]

Bayesian Compressive Sensing and other items.

Shihao Ji, Lawrence Carin,Ya Xue, David Dunson have built a webpage on Bayesian Compressive Sensing and have also released a version 0.1 of their software as featured on Bayesian Compressive Sensing
SMAI-Thales meeting in October where two talks on Compressed Sensing will be presented in French in Paris.
The Von Neuman symposium has passed and Muthu has some thoughts on what was interesting.
Texas A&M will host a conference on Approximation and Learning in High Dimensions on October 19-21, 2007 (Thanks Cable)

Source: Rice repository on Compressed Sensing.

Synthesis on Compressed Sensing and Dimensionality Reduction/ Imaging a Fractal Romanesco

[ This is an old post featured on 7/17/07, most of the thoughts mentioned here have received some type of closure or are better understood]

I recently did a guest appearance on John Langford's blog on the subject of Machine Learning and Compressed Sensing. Andrew Gelman also talked about Compressed Sensing as a Universal Dimension Reduction technique by Mike Wakin and Rich Baraniuk in his blog (Statistical Modeling, Causal Inference, and Social Science). Overall, the response was very interesting either through personal e-mail or in the comments section. What comes out of this exercise is a list of elements that had been difficult for me to understand when I first looked at it and then communicate:

There is a confusion that the L1 regularization providing L0 capabilities is the main reason as to why compressed sensing works. This is not truly the case. Historically it has enabled it but it is not required. One can also point to the limits of this statement ( Breakdown Point of Model When the Number of Variables Exceeds the Observations)
There is an enormous confusion on the reason why Compressed Sensing is "better" than the Johnson-Lindenstrauss Lemma. In short, Compressed Sensing uses the same techniques as JL but Compressed Sensing provides a tighter bound (A simple proof of the restricted isometry property for random matrices) for specific random projections.
There is the thought that Compressed Sensing is the same as random projections. This is not the case, one could acquire analog signals of Legendre polynomials coefficients and obtain results (provided the Legendre polynomials are incoherent with the image/signal of interest). Also the random projections currently used in the Compressed Sensing literature are coming from a very specific family of distributions. The current random projections used in Machine learning, here for instance, fulfill obviously the JL lemma but not the RIP constraint of Compressed Sensing. Here is a good description by Terry tao on trying to build matrices that have the RIP (or UUP) property.
I am always eyeing a solution to dimensionality reduction in the context of imagery where a large amount of pixels need to be reduced to smaller set of most important features. This is not the type of problem some other people face. In the statistics and social science world however, datasets generally have fewer data dimension in the first place and so a dimensionality reduction does not bring a tremendous advantage as found in the imagery business.
In the Compressed Sensing approach, every new measurement is democratically adding information to the scene of interest. This is different than say, PCA where the first few eigenvectors are supposed to bring the most salient features.
In particular, there are currently no known sparse random projection matrices fulfilling the RIP i.e. taking a random sample of the signal of interest currently require a full sampling, not a sparse sample. The only sparse random projection used in compressed sensing that I know about, eventually uses group testing to get the full information later.
Compressed Sensing with dictionary of bases is well understood but the new concept of using a similar approach to decompose manifolds clearly requires new or additional wording. It would seem to me that it can be better put by saying that the manifold has a sparse decomposition in the basis of eigenfunctions of the Laplacian operator on that manifold/graph (diffusion geometries and wavelets of Mauro Maggioni, Stephane Lafon an Ronald Coifman).

Finally, there was this issue as to why the image reconstructed from the Rice one-pixel camera is still blurry. This is my abbreviated response to this comment:

With regards to the blurring ( as can be seen here). I agree even with the 40% being blurry but the Rice CS camera is not a purely analog camera: i.e the picture is projected on some Haar basis (every pixel shining is a step function over that area). In effect in order to have good reconstruction, one needs to have good incoherence between the target space and the CS signal space. Haar bases are good at decomposing natural images so I would not expect very good incoherence between the Haar basis and the wavelet basis (used for the reconstruction). If the Rice CS camera were to be replicating diracs instead of Haar functions, you’d [probably] get less blurry images. A similar note was made by Terry Tao in one of the comments in his blog.
Also you have similar blurry reconstructions from live images from the Random Lens Imager at MIT that uses the compressed sensing framework.... one of the reason is that we are not using 3d functions to do the reconstruction as the camera is certainly sensitive to depth as well. Please note the need for Machine Learning techniques to do the calibration of these imagers.

Ever since that comment, I found a mention of a basis that can be designed to be incoherent with the Haar Basis: the noiselet basis of Coifman, Geshwind and Meyer [1] for which I can find no public paper describing their constructions other than that of Romberg and Candes. It may be interesting to see how the reconstruction mentioned above fares with this noiselet basis. It would also be judicious to take photos of brownian or chaotic (high dimensional) systems so that there is a better public understanding of the connection between the sampling rate and reconstruction requirements. Similarly, one could play with the compressed sensing imaging of a, rare to find in Texas , Romanesco. Plain Broccolis should do.

One could also try to make the connection between the fact that in order for Compressed sensing to work well there is a need for the distribution of projection coefficients to follow a power law and the fact that some objects (like the romanesco) have features (curvelet-like) that follow a power law. Obviously this could be done without the one-pixel camera but a real experiment always trumps a computer one.

Reference: [1] R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Appl. Comp. Harmonic Analysis, 10:27–44, 2001

Friday, July 13, 2007

Adding Search and Rescue Capabilities (part II): Modeling what we see and do not see

One of the concern that one has during a search and rescue operation (part I is here) is whether or not, the item of interest was seen or detected. I am not entirely sure for instance that SAROPS includes this, so here is the result of some of the discussions I have had with some friends on this. While the discussions were about the Tenacious, one should keep an eye on how it applies to other types of mishap that may lead to a similar undertaking.

In the search for the Tenacious, there were several sensors used at different times:

Mark One Eyeball from Coast Guards or from some private parties or from onlookers from the coast
sensors used by the Coast Guard in Planes and Boats
sensors (Radar, visual, IR, multispectral) from satellites or high altitude planes
webcams looking at the SF port and bay.

Each and every one of these sensors give some information about their field of view but they are limited by their capabilities. The information from the sensor is dependent on its resolution and other elements. While the issue of resolution is well understood, at least spatially, sensor visibility is dependent on:

cloud cover (high altitude, satellites), haze (low altitude)
the calmness of the sea
the orientation of the sensor (was the object of interest in the sensor cone ?)
the ability of the sensor to discriminate the target of interest from the background (signature of the target)
the size of the target (are we looking for a full boat or debris ?)

And so whenever there is a negative sighting over an area, the statement is really about the inability of the detector to detect the target of interest due the elements listed above. And so the probability of the target of interest not being there is not zero (except in very specific circumstances). In effect, when the data fusion occurs when merging information from all these sensors, it is important to be able to quantify what we don't know as much as what we know. It is also important to realize that different maps are really needed for each scenario. A scenario about searching for debris is different from that of searching for a full size boat. What the detectors/sensors see is different in these two scenarios. While one can expect to have a good signal when searching for a full size boat, most sensors are useless when it comes to detecting minutes debris.

In the aerospace business, some of us use software like STK that provides different modules in order to schedule and understand information about specific satellite trajectories and so forth. It may be a good add-on to the current SAROPS capabilities in terms of quantifying the field of view.

But the main issue is really about building the right probability distribution as the search goes on and how one can add any heterogenous information into a coherent view of the search.

Time is also a variable that becomes more and more important as the search goes. In particular it is important to figure out the ability to do data fusion with time stamped data. One can see in this presentation, that while the search grid is regular, one can see some elements drifting out of the field of view as the search is underway. So the issue is really about quantifying data fusion with sensors input as well as maritime currents and provide a probability of escaping the search grid. SAROPS already does some of this, but I am not sure the timing element of the actual search (made by CG planes, boat) is entered in the software as the search go on. It was difficult for us to get back that timing from the search effort (it was rightfully not their priority) and one simply wonders if this is an input to SAROPS when iterating on the first empty searches. If one thinks along the lines of the 8000 containers scenario, this is important as it has been shown that some of these containers have different lifespan at sea level and right under the surface. In this case, the correlation between time stamped sensor outputs become central as a submerged but within a few feet underwater containers may not be viewable from specific sensors (but would remain dangerous to navigation). Also this is not because we did not see anything on the second path at the same location (provided no current) that the object is not here anymore, rather the sensor did not detect it. In the illustration below one can see the different targets found by the Radarsat/John Hopkins team for the Tenacious. Without time stamp it is nearly impossible to make a correlation between hits on the first and the second satellite path.

The bayesian framework seems to have already been adopted by SAROPS and previous versions. It may need some additional capabilities to take into account most the issues mentioned above (sensor network or EPH). In either case, a challenge of some kind, with real data might be a way to advance the current state of the art.

Thursday, July 12, 2007

Compressed Sensing Hardware Implementations (part deux)

[Update Nov. '08: Please find the Compressed Sensing Hardware page here ]

In a previous entry, I mentioned the most recent Compressed Sensing Hardware implementations I knew about. According to the recent presentation by Justin Romberg at the Short Course on Sparse Representations and High Dimensional Geometry at IPAM, it looks as though I missed two other implementations as described by Romberg in his talk:

* a Hyperspectral imager at Yale where algorithms of Ronald Coifman and Mauro Maggioni are being used against data gathered from a Texas Instrument DMD and a tuned laser light. You can see in the abstract of the article where a feature recognition is doing better when using random projection for hyperspectral imagery:

This leads us to ask how discriminatory spectral features should be selected. The features in previous work on cancer spectroscopy have been chosen according to heuristics. We use the “best basis” algorithm to select a Haar wavelet packet basis which is optimal for the discrimination task at hand. These provide interpretable spectral features consisting of contiguous wavelength bands. However they are outperformed by features which use information from all parts of the spectrum, combined linearly at random.

Hyperspectral imagery has a lot of potential because most of the data gathered is redundant in the first place. As far I can tell, in this example, the large dataset is compressed using random projections and they are then using machine learning techniques (diffusion methods/maps) in order to figure out the neighborhood of a sample given labeled training sets. More information can be found in the presentation of Coifman at IMA.

* an Analog imager at Georgia Tech but I could not find a trace of it for the moment.

The analog route is a fascinating one for two reasons:

one can use random projections but one can also use other bases. It used to be that you could find analog sensor that were decomposing data directly in their fourier spectrum.
analog used to be the way to go before the digital revolution and so there is a tremendous knowledge in acquiring analog signals. Compressed sensing opens the door to using these systems again.

Monday, July 09, 2007

Adding Search and Rescue Capabilities (part I): Using Hyperspectral and Multispectral Cameras to Search for Non-Evading Targets

In the search for the Tenacious, I mentioned the possibility of using Hyperspectral or multispectral cameras on-board current satellites to see if there was a way to locate it. The big challenge resides in the resolution. Most of these cameras have a coarser resolution than, say, either the Tenacious or any medium sailing boat (i.e. one pixel is more than the size of the boat). Hence the generic expectation is that one cannot locate a boat using these. These cameras are also mostly used for other purposes such as environmental studies and the access is rightfully restricted to a small circle of experts. Because of that there is a large amount of convincing to do in order to have access to that imagery. The underlying reasoning as to why we could, in effect, discriminate between something that is interesting and something that is not, can be put in two categories:

the boat and its wake span a large part of the pixel and using a few bands, one can see a large difference between a man made object and the sea. In other words, the underlying scene is very sparse and one in effect detect very rapidly interesting artifacts. This a little bit like superresolution.
In some cameras like Hyperion on EO-1 or Meris (Envisat) there are 250 spectral channels. Even if the spatial resolution is coarse, we are bound to see something different when using 250 bands (as opposed to the traditional three color bands) especially against a very uniform background (sea). Techniques such as the ones developed by Mauro Maggioni and Ronald Coifman should be evaluated for that purpose.

Recall that the idea is to produce a map of what is potentially interesting, not an exact match of where that target is. A second step dealing with data fusion is responsible for eliminating the false positives given information from other sensors. With the help of Lawrence Ong, Sorin Popescu and I sho wed that you could see boats with Landsat 7. This is new but falls into the first category highlighted above. The second category has not been investigated as far as I know: Maybe it should. There are three categories of targets/signatures that should be investigated:

During the Tenacious search, a false positive was detected in the form of a sighting of a green dye. These dye are generally part of "distress kits" and used whenever a boat want to make it clear it has problem. While it was a false positive for other reasons, I had a discussion with the EO-1 folks (at JPL and Goddard) who mentioned that maybe producing ground truth data with green dye and Hyperion could probably lead to having a similar capability than the one we currently have for detecting volcanoes. In other words, produce a calibration formula to be fed to EO-1 so that in the future, its autonomous detection capability can provide data to the ground that this green dye has been detected over a specific area. Since one can schedule imagery on EO-1 and figure out Envisat data gathering cycle, this could be done as a small private endeavor.
Another signature of interest is that of the boat as produced on the camera. If it is a boat or a plane, it is very likely that these have been imaged before by the same camera over the same harbour or airport at some other time. But for the latter, a signature is not really that important per se. A large signal over background noise on some channels should be enough to find that boat/plane. In the case of the plane, the signature may be interesting as the background is generally cluttered.
In order to verify the ability to find current boats at sea, one could try to locate the boats currently involved in much advertized journeys or races. One could find out from the current stock of envisat and eo-1 photos whether boats like the Schooner Anne can be located. That boat is part of a 1000 days at sea journey. They have a map of their location day after day. The boat is a schooner (or about 120 feet large).

Another item that would have sped up the search is the ability to query simultaneously different databases on the availability of hyperspectral or multispectral images from different platforms. Either USGS or the ESA platforms are very nice, but making it into one search would have been a nice time saver. I am also pretty sure that there are other Earth Observation platforms from Asia (India in particular) that could have been used, provided I knew about them. Yet I cannot find anywhere on the web a catalog of civilian hyperspectral or multispectral imagers on current satellites.

Finally, let us recall that doing this can help us locate hard cases like the Tenacious but it may also help us in a totally different endeavor. As one can see from the extraordinary effort of the Coast Guards for the Tenacious, one boat can consume a large amount of man power. Let us imagine a case where you have to do the tracking of 8000 targets lost at sea.

In the search for Cessna N2700Q, the Civil Air Patrol tried the new ARCHER system without su

ccess on that search. And it looks like this is not happening only for this search as some people are doubting its capability for Search and Rescue Operations.
As indicated by Bradley,

CAP forum posts indicate ARCHER
requires a very narrow search area to be of much use. Our problem is that we're not sure where this Cessna pilot went after he dropped
below radar (N34° 48' 7" , W111° 56' 52").

This is the same problem that arises for EO-1, the swath of interest is generally very narrow compared to the size of the problem. We should probably think of a way of integrating Compressed Sensing into current hyperspectral imagery to increase the field of view. Let us recall that one of the reason this would be interesting is that these systems are there to point out major differences from the background, they are not there to produce very nice imagery.

If any of those items are of interest to you please contact me. I am particularly interested in people (from Asia, Europe or the U.S.) that can have direct access to this type of imagery so we can test some of what is said in this entry.

[ Si vous pensez que ce sujet est important et qu'il doit etre etudie, je serais tres heureux de pouvoir vous aider. N'hesitez pas a me contacter ]

Wednesday, July 04, 2007

Is it time for a lessons learned from Jim Gray's search and rescue operation ?

I have been in touch with another search team ever since the disappearance of Jim Gray's boat and it seems to me that there are similar items that probably need to be thought better in terms of communication, data handling and fusion.

Tuesday, July 03, 2007

Influence des Chercheurs Francais: La derniere generation.

Nick Trefethen essaie de faire des statistiques sur les chercheurs qui ont le plus influences les mathematiques appliquees qui sont elles-memes a l'origine du development exponentiel des sciences depuis deux siecles. C'est une presentation de tout les chercheurs tres influents toutes nationalites confondues. Il y a tres peu de francais:

Paul de Faget de Casteljau, Ingenieur chez Citroen, issu de Normale Sup
Pierre Bezier, Ingenieur chez Renault, issu de l'ENSAM
Adrien-Marie Legendre professeur a Polytechnique
Jean Morlet (qui vient de deceder), geophysicien a ELF, issu de Polytechnique,
Yousef Saad, francophone qui est passe par le cycle francais a l'Universite de Grenoble

Il n'y a pas Stephane Mallat ou Emmanuel Candes dans la liste finale (bien que Mallat soit dans le reste de la presentation). Pour Candes, la liste est un peu vieille. Trefethen essaie de tirer des grandes lignes de l'environement de ces chercheurs (toute nationalites confondues) et il nous dit:

"...The inventors were/are almost all academic mathematicians

Most were extremely eminent
Their great discoveries came at all ages
About half had major involvements with government or industry (That’s big industry—AT&T, IBM, Boeing, etc.— and big government labs like Argonne, Harwell, NPL)
Most were seriously involved with applications
It’s hard to disentangle the effects of WWII.."

Il est interessant de voir que des travaux importants ont ete fait par des gens de tout ages. C'est un fait tres notable parce que il y a beaucoup de prix ou autres filieres qui sont en France et a l'etranger restrictive par rapport a l'age (medaille Fields, entree a Normale Sup,...) Pour ce qui est des francais, on peut voir trois generations differentes:

Legendre, il y a deux siecles
Bezier, De CastelJau et Morlet qui ont fait partie de grandes entreprises francaises apres la deuxieme guerre mondiale. Ils sont tous decedes.
Saad n'est pas francais (il est etranger dans le sens ou il n'a pas fait sa scolarite secondaire en France et donc n'a pu etre detecte par le systeme des prepas) mais il est passe par le systeme universitaire francais pour continuer sa carriere aux U.S. Candes fait sa carriere aux U.S. Mallat a evolue aux U.S. et est revenu en france et a cree une entreprise qui utilise la technologie qu'il a develope pendant ses recherches les plus recentes. (pour la petite histoire, Mallat relate son histoire pour le developement de technologies innovates en France dans : Tribune Libre sur la Recherche et l'Innovation, Gazette des Mathematiciens of the SMF, no. 121, July 2009. pdf)

Si l'on s'en tient au critere de la liste de Trefethen a la fin de sa presentation Mallat et Candes ne sont pas la, en partie parce que la liste commence a date. Il n'y donc guere que la generation qui se trouvait au sein des grandes entreprises qui ait pu influencer la science a ses yeux. Il y a un fait troublant:: Jean Morlet en particulier a ete mis a la retraite car, apres le scandale des avions renifleurs, ELF ne voulait plus etre vu comme trop innovant. Yves Meyer en parle dans son tribu a Morlet:

Morlet's scientific vision was decisive for the success of wavelet analysis. But as Pierre Goupillaud is telling us, this had no effect on Morlet's career at the Elf-Aquitaine company. In fact Morlet's only reward for years of perseverance and creativity in producing this extraordinary tool was an early retirement.

Etre innovant cela veut aussi dire que l'on peut se planter. Au vu de la date de l'affaire (1978-1983), on peut se demander si elle n'est pas revelatrice de la fin de tout paris d'innovation dans ces grandes entreprises avec l'arrivee d'un management beaucoup plus frileux et donc la cessation de toute production de chercheurs qui aient une influence mondiale. Le modele de recherche et developement au sein des grandes entreprises nationales ne semble donc plus etre reellement innovant. Si un passage aux U.S. semble etre une condition essentielle pour avoir une influence definitive sur son champs de recherche, il serait peut etre importun de songer a cree une dynamique similaire en France. Bien que Polytechnique fasse son boulot pour identifier les jeunes qui sont les plus de prometteurs, le systeme actuel echoue apres la sortie de l'ecole meme pour ceux-la. Ils sont donc obliges de passer par le systeme americain qui est exceptionel au niveau Bac + 4 et plus. Comme les grandes entreprises francaises ne sont plus la pour assurer la place qu'elles avaient en recherche, le systeme actuel ne fait plus reellement aussi aucune place a ce que l'on appelle aux U.S. des "late bloomers", ceux qui "murrisent" plus tard mais qui ont un impact au moins aussi important.

Random Projection Lapack ?

Tamás Sarlós describes in this paper the use of random projections through the Fast JL transform so that one can reduce the size of very large matrices in order to compute their SVD. A related approach was taken by Thomas Strohmer and Roman Vershynin [1] to build a ranodmized solver with exponential convergence. It sounds like as if there is a potential for a Random Projection LAPACK (RP-LAPACK) but I wonder how these transformations will fare when confronted with Non-normal matrices or discretization of Non-normal operators. In this context, Non-normality has nothing to do with probability distributions but rather a way to describe Non hermitian matrices (or matrices/operators that do not commute with their transpose). In the case of high non-normality, these matrices have very wide spectral portraits. Corollary 4 of Sarlos' paper would point to the potential blow-up in the case of highly non-normal matrices, a situation not unlike the smashed filter case, where noisy data can overcome the tight bound on the projection in the lower dimension random space (theorem 1 of the smashed filter paper). High nonnormality occur whenever there exists a strong coupling between two or more phenomena, giving rise to high spectral instabilities. Examples include linear transport theory for neutrons but other applications where non-normality shows up are listed here. Using these matrices would help in figuring out how bad the problem is.

On a different note, Nick Trefethen pointed out that both some Random matrices and Toeplitz matrices have large pseudo-spectra:

The eigenvalues of finite banded Toeplitz matrices lie on curves in the complex plane that have been characterized by Schmidt and Spitzer [SS60]. In contrast, the spectrum of the corresponding infinite dimensional Toeplitz operator is the set of points in the complex plane that a(T) encloses with non-zero winding number; see Theorem 1.17 of [BS99]. The limit of the spectrum of banded Toeplitz matrices as the dimension goes to infinity is thus typically very different from the spectrum of the limit.

This uncomfortable situation can be resolved by considering the behavior of pseudospectra. Though the eigenvalues of the finite Toeplitz matrices may fall on curves in the complex plane, Landau [Lan75], Reichel and Trefethen [RT92b], and Böttcher [Böt94] have observed that the resolvent norm ||(zI-TN)-1|| grows exponentially in the matrix dimension for all z in the interior of the spectrum of the corresponding infinite dimensional operator. (As a result, it is difficult to accurately compute eigenvalues of non-symmetric banded Toeplitz matrices even for matrices of relatively modest dimensions, and this signals a warning against basing analysis of finite Toeplitz matrices based on eigenvalues alone.) Furthermore the above cited authors also concluded that the pseudospectra of TN converge to the pseudospectra of T.

Could Pseudo-spectra provide directions on successful implementation of compressive sampling using Toeplitz and Random matrices ? Does the fact that Random Matrices replacing Toeplitz matrices in compressed sensing help in thinking how one can devise a Deterministic Constructions of Compressed Sensing Matrices ? This would not be so outlandish since:

The Restricted Isometry Property is a condition on the spectral norm of the matrices A^T = Phi* Phi

[ Update: Terry Tao just wrote to explain the issue with building deterministic UUP matrices ]

On a totally different note, I just found this paper on Identification of Matrices having a Sparse Representation by Holger Rauhut, Gotz Pfander and Jared Tanner. Initially derived for wireless communication inverse problem and sonar, it could readily be applied to neutron transport as well.

Reference:
[1] T. Strohmer, R. Vershynin, A randomized solver for linear systems with exponential convergence, RANDOM 2006 (10th International Workshop on Randomization and Computation), Springer Lecture Notes in Computer Science 4110, 499--507. Journal version: A randomized Kaczmarz algorithm with exponential convergence, Journal of Fourier Analysis and Applications, to appear

Monday, July 02, 2007

The importance of L1

I don't remember it so clearly described but the updated page on Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other Inverse Problems by Mario Figueiredo, Robert Nowak, Stephen Wright features the new updated paper on the technique and how it compares with other techniques. Besides explaining the algorithm, the authors must be thanked for making clear the connection between compressed sensing and other techniques such as LASSO that are using the L1 norm and for what purpose. I touched upon the reason for using L1 before but it was not framed in this exceptionally clear "background" discussion. Thank you guys.