´º½º
¼ÒÇÁÆ®¿þ¾î
ÇÑ±Û È¨ÆäÀÌÁö
±â¼úÁö¿ø
ȸ»ç
ÁÖ¹®
ÁúÀÇ/ÀÀ´ä
ÇÑ±Û CambridgeSoft | ÇÑ±Û Systat Software  
Salford System Ȩ   |  Á¦Ç°   |  CART   |  MARS   |  ±â¼úÁö¿ø   |  White Papers   |  °¡°Ý/ÁÖ¹®   |  ¹®ÀÇ

Frequently Asked Questions About MARS®

Q1. What is a regression analysis?

Q2. What are the benefits of regression analysis?

Q3. Why is regression modeling not used more frequently?

Q4. How does MARS help data miners use regression modeling?

Q5. How does MARS differ from conventional regression?

Q6. How does MARS construct its models?

Q7. What control over modeling does MARS provide the user?

Q8. How does MARS handle missing values ?

Q9. How does MARS ensure that a model will perform as claimed on future data?

Q10. How can MARS models be implemented for predictive purposes?

Q11. What applications is MARS best suited for?

Q12. Why is MARS better than a decision tree for regression?

Q13. How does MARS compare with neural nets?

Q14. How quickly can MARS generate results?

Q15.  How large a problem can MARS handle?


 Q1. What is a regression analysis?

    A1.  Regression analysis is an effective mathematical modeling technique of increasing popularity in the corporate world.  Regression analyses essentially fit straight lines to data, creating simplified but frequently accurate summaries of the relationships between a variable of interest and other variables.

 Q2. What are the benefits of regression analysis?

    A2.  Regression analysis has many technical and practical benefits.  Among the most important to the business community are:

    • Regression models are often simple and quite accurate.
    • Regression analyses can handle a large number of predictive factors simultaneously.
    • It is easy to score a database with a regression model.
    • Modelers can usually read the importance of any factor directly from the model.
    • A wide variety of applications support regression analysis.

    While more common analyses based on tables and charts are limited to a small number of dimensions, a regression model can take into account a virtually unlimited number of factors.  In forecasting sales, for example, a regression could easily adjust for season, pricing, promotions, sales force, competitive factors, delivery delays, regional, national, and world economic developments in a single model.  Regression also provides a precise estimate of the magnitude of the effect of each factor on the variable being modeled.  Thus, a regression forecasting sales could provide a precise estimate of the impact of a 2% increase in home sales on furniture sales while simultaneously adjusting for 30 or 40 other factors.


 Q3. Why is regression modeling not used more frequently?

    A3. Developing a reliable regression model requires an expert with both analytical experience and subject matter mastery.  Regression modeling is a painstakingly slow process that becomes exponentially more complex as the number of database fields found in the data warehouse increases.  In data mining the process can soon become overwhelming.


 Q4. How does MARS help data miners use regression modeling?

    A4. The major advantage of MARS is that it automates all those aspects of regression modeling that are difficult and time consuming to conduct by hand.  These include:

    • selecting which database fields to use,
    • handling missing values,
    • transforming variables to account for non-linear relationships,
    • detecting interactions (i.e., determining when the effect of one factor materially depends on one or more other factors), and
    • self-testing to ensure that the model will perform well on future data.

    The end result is a more accurate and more complete model than could be hand crafted by even the most experienced expert modeler.


 Q5. How does MARS differ from conventional regression?

    A5.   Conventional regression models typically fit straight lines to data.  Although this usually oversimplifies the data structure, the approximation is sometimes good enough for practical purposes.  However, in the frequent situations in which a straight line is inappropriate, an expert modeler must search tediously for transformations to find the right curve.

    MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight lines from the beginning.  MARS builds its model by piecing together a series of straight lines with each allowed its own slope.  This permits MARS to trace out any pattern detected in the data.  An example of a MARS? regression is shown below on the left. The actual data with a conventional regression model superimposed is shown on the right.


 Q6. How does MARS construct its models?

    A6.  MARS starts from the premise that most relevant variables affect the outcome in a complex way.  Therefore, when MARS considers whether to add a variable to a model it simultaneously searches for appropriate break points (known as ?knots?).  Models are constructed in a two-phase procedure.  Phase I is a fast search that tests all database fields and potential break points, resulting in a deliberately overfit model.   Phase II refines the model by eliminating redundant factors and components that do not stand up to testing.  The final model retains only the important twists and turns and is also optimal for predicting from new data.


 Q7.  What control over modeling does MARS provide the user?

    A7.  MARS offers the user a great deal of control over the model development process. A number of techniques (discussed in detail in the comprehensive documentation) are available to shape and refine this process, including:

    • requiring selected variables to have straight line effects (no knots)
    • forbidding any interactions
    • permitting interactions between select variables only
    • permitting interactions only up to a specified degree of complexity
    • specifying a minimum distance between knots
    • encouraging MARS to produce simpler final models

    MARS automatically sets all control parameters to intelligent defaults so that the modeling process can be easily run by a first-time user.   Experienced modelers, however, may modify the control parameters.


 Q8.  How does MARS handle missing values ?

    A8.  MARS deals with the problem of missing values in regression in an entirely new way. First, MARS develops the best model possible using the available data.  In addition, for each variable with missing values, MARS develops a sub-model based on substitute variables.  This sub-model may be based on a single surrogate or on a complex function of several surrogates.  For example, MARS may develop a model based on income data and simultaneously a sub-model based on education and age for use when income is missing.  This surrogate process is far more convenient and reliable than other approaches, which attempt to ?fill in? the missing values with imputed values.


 Q9.  How does MARS ensure that a model will perform as claimed on future data?

    A9.  Almost all modern modeling technologies can track training data accurately; in fact; some methods can actually guarantee perfect results.  The problem is that such ?overfit? models are useless for predicting outcomes on new data.  The best known of such poor performers are stock market price predictors that frequently work perfectly on yesterday?s data but rarely predict tomorrow?s prices correctly.  MARS protects users from such misleading results through its two-stage modeling process.  As described above, MARS deliberately overfits its model initially, but then prunes away all components that would not hold up on fresh data.

    All decision makers want the most accurate model possible.  At the same time they need honest assessments of how well any predictive model can be expected to perform.  MARS provides such honest assessments through use of one of the two built-in testing regimens, cross-validation or reference to independent test data.  Using these tests, MARS determines the degree of accuracy that can be expected from the best predictive model.


 Q10.  How can MARS models be implemented for predictive purposes?

    A10.  A MARS predictive model can be implemented in two ways.   First, new databases can be scored directly by MARS.   Simply identifying the MARS model to be implemented and the new data to be scored is sufficient to apply the results.  MARS will perform all the required data transformations and calculations automatically and output the predicted scores.  Second, the MARS predictive equation can be exported as ready-to-run C and SAS® source code that can be used without modification in the user?s own application framework.  This built-in flexibility allows users to construct their own custom applications incorporating a complete standalone rendition of the model.


 Q11.  What applications is MARS best suited for?

    A11.  MARS is ideal for predictive modeling of continuous outcomes such as:

    • How much will a customer spend on his next catalog order?
    • How large a balance will a credit card holder carry?
    • How many minutes will a person use a cell phone this month?
    • What is the expected loss on an insurable risk?

    MARS can also model binary (yes/no) questions by providing a predicted probability of an outcome.  Examples include questions such as:

    • Will a homeowner refinance her mortgage in the next quarter?
    • Will a household respond to a direct mail offer?
    • Will a bank customer sign up for a new credit card?
    • Will a treatment for a medical condition succeed?

    Classification problems are better handled by decision trees, such as CART®.  Examples of problems that are not appropriate for MARS include:

    • Which long distance service (AT&T, MCI, Sprint, Other) will a household use?
    • What type of vehicle (car, van, truck) will a person purchase?

 Q12.   Why is MARS better than a decision tree for regression?

    A12.  While decision trees are excellent classification tools, they can be deficient when it comes to regression.   A decision tree with 30 terminal nodes is capable of making only 30 distinct predictions (one per node); thus, all records landing in a node receive exactly the same prediction.  MARS is capable of predicting with much higher resolution and accuracy, typically producing unique scores for every record in a database.


 Q13.  How does MARS compare with neural nets?

    A13.  MARS is always much faster and more interpretable than a neural net and is often more accurate as well.  See De Veaux, et al., 1993 (Computers chem. Engng, Vol. 17, No 8, pg 819) for a comparative study in which the authors suggest that MARS could be used instead of neural nets in a wide variety of applications. In addition, unlike Neural Nets, MARS automatically determines which variables to use, thereby saving considerable analyst time and effort.


 Q14.  How quickly can MARS generate results?

    A14.   Because MARS? highly automated, fast analytical engine generates results much faster than other methods, it can be used to slash the development time of conventional statistical modeling.  Exploratory desktop analyses on a sample of 10,000 records and 10 input variables can be conducted in less than 20 seconds.  More typical problems involving 100,000 records and 30 predictor variables run in approximately 10 minutes on a 400 MHz desktop while problems with 500,000 records and 100 variables can be analyzed in less than 2 hours on industry standard servers.


 Q15.  How large a problem can MARS handle?

    A15.  MARS can handle up to 8,000 database fields and as many training records as can be loaded into RAM.  Current data mining experience confirms that MARS scales to the largest enterprise servers.



Salford System Ȩ | Á¦Ç° | CART | MARS | °¡°Ý/ÁÖ¹® | White Papers | ±â¼úÁö¿ø | ¹®ÀÇ