Skip to content

Lecture 3

chris wiggins edited this page Apr 2, 2017 · 21 revisions

readings

Tukey's FoDA 1962

  • citation: Tukey, John W. "The future of data analysis." The Annals of Mathematical Statistics 33.1 (1962): 1-67.

  • reading: ONLY SECTIONS 1-11, 33 , 43-END are required

  • Questions:

    • stats: math vs science
    • role of universities
    • world as is vs should be

We will come back to these questions later with Breiman's The Two Cultures and Cleveland's "Data Science", both 2001

Diaconis's "Magical Thinking" 1985

  • citation: Diaconis, Persi. "Theories of data analysis: from magical thinking through classical statistics." In Exploring Data Tables, Trends and Shapes. Edited by D. Hoaglin, F. Mosteller, and J. Tukey. 1-36. New York: Wiley, 1985.
  • reading: PAGES 1-9 AND 31-END

Chambers' GLS 1993

optional: EDA '77

  • citation: John W. Tukey-Exploratory Data Analysis-Addison Wesley (1977).pdf

  • context: This is Tukey's 1977 textbook written about six years before PCs became widely available. It's VERY long. Just glance through and look at some of the suggestions he has for work to do with pen and paper--and computer, when they become available. Extremely valuable today!

  • readings: please focus on 3 sections:

    1. section 1a, pp1-3. Note:
    • emphasis on subjectivity
    • emphasis on domain expertise (this is a term engineers, particularly software engineers, use to mean knowing something about a specific application area that makes engineering tools more successful, for example the expertise which helped statisticians decide which aspects of a country would be more or less useful for a king to know about. The role of domain expertise in machine learning is a topic of current debate, with successful engineering results in 'deep learning' seeming to illustrate that we need no knowledge of the "best features" for some application areas.)
    • early distinction between exploratory (w/o "model") and confirmatory (w/model, esp. with p-values)
    1. section 2c, pp39-43.
    • Background: the "5 numbers" JWT mentions are:
      1. high extreme
      2. high hinge (usually interpreted as quartile; defined graphically on p33)
      3. median
      4. low hinge (usually interpreted as quartile; defined graphically on p33)
      5. low extreme
    • Note the emphasis on the mechanics of EDA, even down to the paper he used, absent of software solutions
    1. sec 19B, pp623-625. Note:
    • immediately dispels the idea that the Gaussian is a "law to which data must adhere"
    • pp624-625: tables of calculated numbers were more common before personal computers

Discussion

A couple of things to think about as you read these three pieces

  1. in a historical and philosophy vein, think about how you would do the sort of work that Desrosieres did with the early modern statisticians: what are the different visions of statistical work these authors present? Desrosiers uses controversies to help articulate the stakes of different forms of scientific practice--can you do that here? How do these authors portray those they are arguing against?

  2. In a more instrumentalist vein, please bring in at least one form of data analysis you've gleaned from these papers you might like to do, or to figure out how to do, in class.

Clone this wiki locally