sci.math #139229 (8 + 1065 more)                           -(1)+-[1]
Distribution: world                                            \-( )--[1]--[1]
Newsgroups: sci.math
[1] Re: Why Least Squares ???
From: george.caplan@channel1.com (George Caplan)
Message-ID: <40.63958.2571@channel1.com>
Date: Sat, 25 May 1996 17:19:00 -0640
Organization: Channel 1(R) 617-864-0100 Info
Lines: 10

How about simply minimizing the total of the absolute values of each
point from the regression line?  Is this method ever used?
Is it difficult? Is it useful?

Thanks.

    George Caplan
 

 * 1st 2.00 #8935 * The difference between doing it and not doing it is doing it
End of article 139229 (of 140056) -- what next? [npq] 


sci.math #139278 (5 + 1065 more)                           -(1)+-(1)
From: hrubin@b.stat.purdue.edu (Herman Rubin)                  |-(1)--[1]
[1] Re: Why Least Squares ???                                  |-[1]

In article <4o8me9$2kk@cnj.digex.net>, Chris Long <clong@cnj.digex.net> wrote:
>In article <40.63958.2571@channel1.com>,
>  George Caplan <george.caplan@channel1.com> wrote:

>>How about simply minimizing the total of the absolute values of each
>>point from the regression line?  Is this method ever used?
>>Is it difficult? Is it useful?

>Yes, and it is both more difficult and useful in certain situations.
>The L_1 regression line can be found, for example, by expressing the
>problem as a binary-integer program and then using a standard package
>to solve.  For a large number of data points, however, this approach
>is likely to not be feasible, and so other approaches must be used.
>On the other hand, an L_1 regression is more robust than an L_2
>regression, i.e. more resistant to outliers, and so if you are working
>with noisy data this approach can be quite attractive.

On the other hand, an L_2 regression line is more robust in another
sense, since it only requires the predictor variables and the disturbances
to be uncorrelated to give reasonable results.  Conditional expectations
are much easier to work with than conditional medians, and we do not
quite need that.  The use of robustness to refer only to outliers, and
not to other specification problems, usually involves strong assumptions
of symmetru.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu   Phone: (317)494-6054   FAX: (317)494-0558
End of article 139278 (of 140056) -- what next? [npq] 

sci.math #139277 (4 + 1065 more)                           -(1)+-(1)
From: hrubin@b.stat.purdue.edu (Herman Rubin)                  |-(1)--(1)
[1] Re: Why Least Squares ???                                  |-[1]

In article <40.63958.2571@channel1.com>,
George Caplan <george.caplan@channel1.com> wrote:
>How about simply minimizing the total of the absolute values of each
>point from the regression line?  Is this method ever used?
>Is it difficult? Is it useful?

1.  It is sometimes used.

2.  It is difficult.

3.  One cannot use small amounts of summary data to combine samples.


-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu   Phone: (317)494-6054   FAX: (317)494-0558
--MORE--(?%)End of article 139277 (of 140056) -- what next? [npq] 

sci.math #139793 (3 + 1065 more)                           -(1)+-(1)
From: D P Dwiggins <dwiggins@ceri.memphis.edu>                 |-(1)--(1)
[1] Re: Why Least Squares ???                                  |-(1)

george.caplan@channel1.com (George Caplan) wrote:
>How about simply minimizing the total of the absolute values of each
>point from the regression line?  Is this method ever used?
>Is it difficult? Is it useful?
>
I didn't have the time yesterday to respond to this, but I simply
can't let the flippant replies of another poster stand alone.

The regression method you are referring to is usually known as
LAD (least absolute deviation) regression, but I just call it
L1 regression (as opposed to L2 = least squares regression).

I use L1 regression all the time, as a first step in eliminating
outliers.  L1 is said to be more "robust", in that the resulting
regression line stays the same, even if you include a datum point
which clearly does not lie on the line.  This is not true for L2
regression; moreover, it is easier to detect whether a suspected
point is truly an outlier if L1 regression is used.

The equations for the L2 regression coefficients are obtained by
using derivatives to minimize the sum of the squares of the error
terms, and this gives closed form equations for the coefficients.
This method works because the square function has a nice derivative.

However, the absolute value function is not differentiable at zero,
and so there is no such nice closed form equations for the L1
regression coefficients.  Instead, an iterative technique is used
to find the best fit line between two of the data points.  In every
experience I've had using actual data, this process converged to a
unique solution in less than a half dozen steps.  (Certainly, however,
it is not difficult to construct a simple example where LAD does not
give a unique solution.)

While the LAD algortihm is a bit tedious to perform by hand, the
actual arithmetic involved is mindlessly simple, and there is no
difficulty in constructing a computer program to automate the
procedure.  An excellent text entitled "Alternative Methods of
Regression" came out last year (or the year before); I'm sorry
I can't think of the author's name, but I believe Academic Press
published it.

dpd

End of article 139793 (of 140056) -- what next? [npq] 

sci.math #139243 (2 + 1065 more)                           -(1)+-(1)
From: schlafly@bbs.cruzio.com                                  |-(1)--(1)
[1] Re: Why Least Squares ???                                  |-(1)

In article <4o1ln0$9mb@newsbf02.news.aol.com>, tony2back@aol.com (Tony2back) wri
tes:
> In article <4nlbv8$gko@stratus.CAM.ORG>, dsevee@CAM.ORG (Denis Sevee)
> writes:
> 
> >
> > In many applications, such as finding a line of regression, the best
> > approximation is defined as something that minimizes the sum of the
> > squares. The usual explanation for why this metric is used is simply
> > that it makes the compuations easier. It is the most convenient. 
> > 
> > Are there any deeper reasons why this metric is used?
> >
> >
> By using 'least squares' we are in effect assuming minimum variance of the
> observations from some theoretical norm.  This is not a bad assumption to
> use, but it is still an assumption.

Actually there is a lot of literature on why it is a bad assumption.
Easy computation is the biggest plus.  2nd plus is the really
misleading theorems that you can prove.

Roger

End of article 139243 (of 140056) -- what next? [npq] 

sci.math #139384 (1 + 1065 more)                           -(1)+-(1)
From: jpc@a.cs.okstate.edu (John Chandler)                     |-(1)--(1)
[1] Re: Why Least Squares ???                                  |-(1)

In article <4o27de$16qs@b.stat.purdue.edu>,
Herman Rubin <hrubin@b.stat.purdue.edu> wrote:
>
>It also has other properties.  Because of the polynomial nature
>of the function being minimized, it is easy to combine samples, to
>add or delete variables, etc.  I do not know of anyone who has done
>it for fourth powers, or who would consider that a better norm, but
>the arithmetic would be much harder.  In construction a regression
>fit on n variable using least squares, the summarization of the data
>takes approximately n^2/2 items.  The leading term for fourth powers
>would be n^4/24, and linear algebra would not suffice.  

Professor Rubin is comparing "least sum of fourth powers of
residuals" to _linear_ least squares, which is a direct process
(not iterative) using linear algebra.

It should be pointed out that "least sum of fourth powers"
can be done iteratively using any _nonlinear_ least squares
package.  Just define the "residuals" that the package wants,
to be the squares of the ordinary residuals.  The package
will square them again, giving fourth powers,
and minimize the sum of those fourth powers.

Minimizing the sum of squares, then the sum of fourth powers,
then the sum of eighth powers, gives an approximate 
minimax fit of a model, either linear or nonlinear, to data.
Each fit is done to provide a decent starting point for
the next fit.

-- 
John Chandler
jpc@a.cs.okstate.edu
End of article 139384 (of 140056) -- what next? [npq] 

sci.math #139564 (0 + 1065 more)                           -(1)+-(1)
From: Eric Gindrup <gindrup@okway.okstate.edu>                 |-(1)--(1)
[1] Re: Why Least Squares ???                                  |-(1)

John Chandler wrote:
...
> Professor Rubin is comparing "least sum of fourth powers of
...

If the "form of solution" of the least squares and least quartics method 
were the same, wouldn't the derived solutions be identical?  My immediate 
thinking is that (x-y)^2 has zero-derivative coincident with (x-y)^4.
                -- Eric Gindrup ! gindrup@okway.okstate.edu
End of article 139564 (of 140056) -- what next? [npq]