Decision Trees

Story points 8
Tags decision-trees
Hard Prerequisites
  • PROJECTS: Statistical Thinking
  • PROJECTS: [TODO] Data Visualisation Projects
  • PROJECTS: Logistic regression
  • Soft Prerequisites
  • TOPICS: Python self-learning
  • TOPICS: Jupyter notebooks best practices
  • TOPICS: Data Science Methodology

  • Background material

    Assignment

    Use a decision tree model to predict whether mushrooms are poisonous or edible.

    1. Split your data into train and test sets.
    2. Get basic descriptive statistics for the training data and check for missing and incorrect or extreme values. Get scatterplots or heatmaps showing the relationship between the variables.
    3. What are the factors that predict whether a mushroom is poisonous?
    4. Report the accuracy of your model on the training set and on the test set. How successful is the model - what is its precision and recall?
    5. What is the prevalence of poisonous mushrooms in the dataset? How might prevalence affect the positive and negative predictive values of a test/model?

    Find the data here.

    More information on decision trees

    Coursera: Decision Trees

    Description of the dataset

    This dataset includes descriptions of 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as either edible or poisonous.

    Variable Description
    classes edible=e, poisonous=p
    cap-shape bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s
    cap-surface fibrous=f, grooves=g, scaly=y, smooth=s
    cap-color brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y
    bruises? bruises=t, no=f
    odor almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s
    gill-attachment attached=a, descending=d, free=f, notched=n
    gill-spacing close=c, crowded=w, distant=d
    gill-size broad=b, narrow=n
    gill-color black=k, brown=n, buff=b, chocolate=h, gray=g, green=r, orange=o, pink=p, purple=u, red=e, white=w, yellow=y
    stalk-shape enlarging=e, tapering=t
    stalk-root bulbous=b, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r, missing=?
    stalk-surface-above-ring fibrous=f, scaly=y, silky=k, smooth=s
    stalk-surface-below-ring fibrous=f, scaly=y, silky=k, smooth=s
    stalk-color-above-ring brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y
    stalk-color-below-ring brown=n, buff=b, cinnamon=c, gray=g, orange=o, ink=p, red=e, white=w, yellow=y
    veil-type partial=p, universal=u
    veil-color brown=n, orange=o, white=w, yellow=y
    ring-number none=n, one=o, two=t
    ring-type cobwebby=c, evanescent=e, flaring=f, large=l, none=n, pendant=p, sheathing=s, zone=z
    spore-print-color black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y
    population abundant=a, clustered=c, numerous=n, scattered=s, several=v, solitary=y
    habitat grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w, woods=d

    RAW CONTENT URL