Oleg Zabluda's blog: October 04, 2016

Oleg Zabluda's blog

Tuesday, October 04, 2016

Diane von Furstenberg on Charlie Rose
Diane von Furstenberg on Charlie Rose

https://charlierose.com/videos/28831 (Sep 16, 2016)
https://charlierose.com/videos/23301 (Dec 12, 2014)
https://charlierose.com/videos/14899 (Dec 15, 2012)
https://charlierose.com/videos/21924 (Nov 17, 2011)
https://charlierose.com/videos/14034 (Sep 09, 2004)
https://charlierose.com/videos/7148 (Nov 05, 1998)
https://charlierose.com/videos/8472 (Dec 20, 1991)

Diane von Furstenberg reflects on her mother, who survived Auschwitz, the iconic wrap dress, her marriages, and her legacy as a designer.
https://charlierose.com/videos/28831

Labels: Oleg Zabluda

What Regularized Auto-Encoders Learn from the Data Generating Distribution (2014) Guillaume Alain and Yoshua Bengio

What Regularized Auto-Encoders Learn from the Data Generating Distribution (2014) Guillaume Alain and Yoshua Bengio
"""
What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density
[...]
Machine learning is about capturing aspects of the unknown distribution from which the observed data are sampled (the data-generating distribution). For many learning algorithms and in particular in manifold learning, the focus is on identifying the regions (sets of points) in the space of examples where this distribution concentrates, i.e., which configurations of the observed variables are plausible.

Unsupervised representation-learning algorithms attempt to characterize the data-generating distribution through the discovery of a set of features or latent variables whose variations capture most of the structure of the data-generating distribution. In recent years, a number of unsupervised feature learning algorithms have been proposed that are based on minimizing some form of reconstruction error, such as auto-encoder and sparse coding variants (Olshausen and Field, 1997; Bengio et al., 2007; Ranzato et al., 2007; Jain and Seung, 2008; Ranzato et al., 2008; Vincent et al., 2008; Kavukcuoglu et al., 2009; Rifai et al., 2011b,a; Gregor et al., 2011). An auto-encoder reconstructs the input through two stages, an encoder function f (which outputs a learned representation h = f(x) of an example x) and a decoder function g, such that g(f(x)) ≈ x for most x sampled from the data-generating distribution. These feature learning algorithms can be stacked to form deeper and more abstract representations.

Deep learning algorithms learn multiple levels of representation, where the number of levels is data-dependent. There are theoretical arguments and much empirical evidence to suggest that when they are well-trained, deep learning algorithms (Hinton et al., 2006; Bengio, 2009; Lee et al., 2009; Salakhutdinov and Hinton, 2009; Bengio and Delalleau, 2011; Bengio et al., 2013b) can perform better than their shallow counterparts, both in terms of learning features for the purpose of classification tasks and for generating higher-quality samples.

Here we restrict ourselves to the case of continuous inputs x ∈ R^d with the data-generating distribution being associated with an unknown target density function, denoted p. Manifold learning algorithms assume that p is concentrated in regions of lower dimension (Cayton, 2005; Narayanan and Mitter, 2010), i.e., the training examples are by definition located very close to these high-density manifolds. In that context, the core objective of manifold learning algorithms is to identify where the density concentrates.

Some important questions remain concerning many of feature learning algorithms based on reconstruction error. Most importantly, what is their training criterion learning about the input density? Do these algorithms implicitly learn about the whole density or only some aspect? If they capture the essence of the target density, then can we formalize that link and in particular exploit it to sample from the model? The answers may help to establish that these algorithms actually learn implicit density models, which only define a density indirectly, e.g., through the estimation of statistics or through a generative procedure. These are the questions to which this paper contributes
[...]
Section 3 is the main contribution and regards the following question: when minimizing that criterion, what does an auto-encoder learn about the data generating density? The main answer is that it estimates the score (first derivative of the log-density), i.e., the direction in which density is increasing the most, which also corresponds to the local mean, which is the expected value in a small ball around the current location. It also estimates the Hessian (second derivative of the log-density).
[...]
Regularized auto-encoders capture the structure of the training distribution thanks to the productive opposition between
reconstruction error and a regularizer. An auto-encoder maps inputs x to an internal representation (or code) f(x) through the encoder function f, and then maps back f(x) to the input space through a decoding function g. The composition of f and g is called the reconstruction
function r, with r(x) = g(f(x)), and a reconstruction loss function ` penalizes the error made, with r(x) viewed as a prediction of x. When the auto-encoder is regularized, e.g., via a sparsity regularizer, a contractive regularizer (detailed below), or a denoising form of regularization (that we find below to be very similar to a contractive regularizer), the regularizer basically attempts to make r (or f) as simple as possible, i.e., as constant as possible, as unresponsive to x as possible. It means that f has to throw away some information present in x, or at least represent it with less precision. On the other hand, to make reconstruction error small on the training set, examples that are neighbors on a high-density manifold must be represented with sufficiently different values of f(x) or r(x). Otherwise, it would not be possible to distinguish and hence correctly reconstruct these examples. It means that the derivatives of f(x) or r(x) in the x-directions along the manifold must remain large, while the derivatives (of f or r) in the x-directions orthogonal to the manifold can be made very small. This is illustrated in Figure 1.

In the case of principal components analysis, one constrains the derivative to be exactly 0 in the directions orthogonal to the chosen projection directions, and around 1 in the chosen projection
directions. In regularized auto-encoders, f is non-linear, meaning that it is allowed to choose different principal directions (those that are well represented, i.e., ideally the manifold tangent directions) at different x’s, and this allows a regularized auto-encoder with non-linear encoder
to capture non-linear manifolds. Figure 2 illustrates the extreme case when the regularization is very strong (r(·) wants to be nearly constant where density is high) in the special case where the distribution is highly concentrated at three points (three training examples). It shows the
compromise between obtaining the identity function at the training examples and having a flat r near the training examples, yielding a vector field r(x) − x that points towards the high density points.
"""
https://arxiv.org/pdf/1211.4246.pdf

Labels: Oleg Zabluda

Sepp Hochreiter's Fundamental Deep Learning Problem (1991) | Jürgen Schmidhuber, 2013
Sepp Hochreiter's Fundamental Deep Learning Problem (1991) | Jürgen Schmidhuber, 2013
"""
A first milestone of Deep Learning research was the 1991 diploma thesis of Sepp Hochreiter [1], my very first student, who is now a professor in Linz. His work formally showed that deep neural networks are hard to train, because they suffer from the now famous problem of vanishing or exploding gradients: in typical deep or recurrent networks, [...] they decay exponentially in the number of layers, or they explode. All our subsequent Deep Learning research of the 1990s and 2000s was motivated by this insight.

The thesis is in German [1]. [...] Ten years later, an additional survey came out in English [2].

We have found four ways of partially overcoming the Fundamental Deep Learning Problem:

I. My first Deep Learner of 1991 overcame it through unsupervised pre-training for a hierarchy of (recurrent) neural networks [4]. This greatly facilitated subsequent supervised credit assignment through back-propagation.

II. LSTM-like networks (since 1997) [5] avoid the problem through special architecture unaffected by it.

III. Today, a million times faster GPU-based computers allow for propagating errors a few layers further down within reasonable time, even in traditional NN - that's basically what's winning many of the image competitions now, e.g., [6]. (Although this does not really overcome the problem in a fundamental way.)
"""
http://people.idsia.ch/~juergen/fundamentaldeeplearningproblem.html
http://people.idsia.ch/~juergen/fundamentaldeeplearningproblem.html

Labels: Oleg Zabluda

Hillary Clinton, who tells dreadful lies ((Marc A. Thiessen)
Hillary Clinton, who tells dreadful lies ((Marc A. Thiessen)
Most Americans no longer believe a word she says — even if she’s telling the truth.
"""
Hillary Clinton tells us she is recovering from a mild case of pneumonia, but less than half of American voters believe her belated explanation of why she appeared to faint leaving a 9/11 commemoration. [OZ: how does Thiessen know she is telling the truth now?]

Today, it is the American people who have been burned, time and again, by Hillary Clinton’s dreadful lies. Let’s review just a few examples of her serial dishonesty:

She lied repeatedly about her emails. She lied when she said she had “turned over everything I was obligated to turn over” (FBI Director James Comey said the FBI “discovered several thousand work-related e-mails that were not among the group of 30,000 e-mails returned by Secretary Clinton to state in 2014”). She lied when she said there was “no classified material” in her private emails . . . that there was nothing “classified at the time” . . . and that there was nothing “marked classified” in her private emails — all of which the FBI director said were untrue. And, to top it all off, she lied about her lies — declaring on national television that “Director Comey said my answers were truthful, and what I’ve said is consistent with what I have told the American people” — a claim The Post’s Fact Checker gave “Four Pinocchios.”

Clinton lied to the American people about Benghazi. At 10:08 p.m. the night of the attack, she issued a statement that blamed the attack on “inflammatory material posted on the Internet” with no mention of terrorism or al-Qaeda. But an hour later, at 11:12 p.m. she emailed her daughter, Chelsea: “Two of our officers were killed in Benghazi by an Al Queda-like [sic] group.” The next day in a phone call with the Egyptian prime minister, Clinton said: “We know the attack in Libya had nothing to do with the film. It was a planned attack, not a protest.” Yet two days later, as she welcomed the caskets of the fallen in Dover, Del., she blamed that attack on “an awful Internet video that we had nothing to do with.”

She lied about a trip she made to Bosnia, claiming that she and her team arrived “under sniper fire,” skipped the arrival ceremony and “just ran with our heads down to get into the vehicles to get to our base.” In fact, a video shows her being greeted on the tarmac by Bosnian officials and an 8-year-old Muslim girl, Emina Bicakcic, who read a poem in English and told Clinton, “There is peace now.”

She lied about her family history. In 2015, she said she could relate to illegal immigrants because “all my grandparents” immigrated to the United States. When BuzzFeed’s Andrew Kaczynski pointed out that three of Clinton’s four grandparents were born in the United States, a Clinton spokesman said “her grandparents always spoke about the immigrant experience and, as a result she has always thought of them as immigrants.”

And her dishonesty stretches back decades. As the late, great William Safire pointed out in a 1996 New York Times column, she delivered a “blizzard of lies” as first lady — about Whitewater, the firing of White House travel aides, her representation of a criminal enterprise known as the Madison S&L and how she made a 10,000 percent profit in 1979 commodity trading simply by studying the Wall Street Journal. Even back then, Safire concluded, Clinton was “a congenital liar.”

Today, the American people agree. A recent NBC News poll found that just 11 percent of Americans say Clinton is honest and trustworthy. To put that in perspective, 14 percent of American voters believe in Bigfoot.
"""
https://www.washingtonpost.com/opinions/hillary-clinton-who-tells-dreadful-lies/2016/09/19/cd38412e-7e6a-11e6-9070-5c4905bf40dc_story.html
https://www.washingtonpost.com/opinions/hillary-clinton-who-tells-dreadful-lies/2016/09/19/cd38412e-7e6a-11e6-9070-5c4905bf40dc_story.html

Labels: Oleg Zabluda

How Trump’s praise of Putin could cost him the election (Marc A. Thiessen)
How Trump’s praise of Putin could cost him the election (Marc A. Thiessen)
"""
what was Trump doing last week giving a surprise speech to just 100 people in Chicago, Illinois — a state he has zero chance of winning?
[...]
Trump seems to finally realize that his bizarre embrace of Russian President Vladimir Putin, and questioning of the U.S. obligation to defend its NATO allies, has alienated a critical voting bloc he needs to win the White House — Americans of Eastern European descent. So last week, Trump took a break from criticizing a former Miss Universe to give a speech to the Polish American Congress — the nation’s most prominent Polish American organization — where he lavished praise on Poland. The fact that Trump is reassuring Polish American leaders less than 40 days before a close election shows he is worried about losing this voting bloc — and with good reason.

Putin is despised by millions of Polish Americans, as well as Czech, Slovak, Ukrainian, Hungarian, Lithuanian, Latvian and Estonian [OZ: but not Russian?] Americans, who either escaped to this country from behind the Iron Curtain or whose parents or grandparents did. These voters know what it is like to live in a police state. Thus, many were appalled when, at NBC’s Commander-in-Chief Forum last month, Trump stated with apparent admiration how Putin “has very strong control over a country” and declared him “a leader, far more than our president has been.” When host Matt Lauer pointed out that Putin “annexed Crimea, invaded Ukraine, supports Assad in Syria, supports Iran” and asked, “Do you want to be complimented by that former KGB officer?” Trump was unfazed. “I’ll take the compliment, okay?” he replied, pointing out that Putin “does have an 82 percent approval rating.”

Not with Americans of Eastern European heritage, he doesn’t.
[...]
there are some 5,583,223 Americans of Eastern European heritage. [...] concentrated in many of the key swing states
[...]
Ohio has at least 865,204 Eastern European American voters, including 420,149 Polish-Americans, 183,593 Hungarian Americans, 118,975 Slovak Americans, and 40,742 Ukrainian Americans. These are the white, ethnic working-class Reagan Democrats whom Trump is expecting to carry him to victory in Ohio.
[...]
Florida, where Trump trails Clinton by two points. In 2000, George W. Bush won the state by just 537 votes. Florida has 747,243 voters of Eastern European descent, most of whom are not happy with Trump’s embrace of Putin.
[...]
Pennsylvania [...] has 1,481,914 voters of East European descent. Wisconsin [...] has 666,194. Michigan [...] has 1,075,800.
[...]
What is baffling is why Trump has needlessly alienated Eastern European voters. Many are working-class Democrats who are his natural constituency and should be attracted to his protectionist message on trade. They came over to the GOP in the 1980s, inspired by Ronald Reagan’s promise to defeat the “Evil Empire,” and ever since, Republican candidates have worked to keep them in the GOP fold. There is a reason that, in July 2012, Mitt Romney chose to visit Poland just a few months before Election Day. He wanted to win the votes of 3,223,613 Polish Americans.

Trump, by contrast, has seemed intent on driving them into Hillary Clinton’s waiting arms. This is especially maddening, because Clinton should be anathema to Americans of Eastern European descent. She was the mastermind behind the disastrous Russian “reset.” It was on her watch that the Obama administration caved to Putin’s demands that we cancel our missile defense agreement with Poland and the Czech Republic — and did it on the 70th anniversary of the Soviet invasion of Poland.
"""
https://www.washingtonpost.com/opinions/how-trumps-praise-of-putin-could-cost-him-the-election/2016/10/03/16b1ebd0-8978-11e6-b24f-a7f89eb68887_story.html
https://www.washingtonpost.com/opinions/how-trumps-praise-of-putin-could-cost-him-the-election/2016/10/03/16b1ebd0-8978-11e6-b24f-a7f89eb68887_story.html

Labels: Oleg Zabluda

About Me