Readings
~ 1 Hour 45 Minutes
Optional Media.
If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.
- Hill for the data scientist: an xkcd story. So if correlation isn't causation and rejecting the null hypothesis isn't enough for us to prove the alternative hypothesis, how can we ever know anything? Well, we can use common sense and some rules of thumb. Hill's Criteria are a set of guidelines for evaluating causation, and this resource explains them using xkcd comics.
- If you want to learn more about the replication of scientific results and what has come to be known as the replication crisis, you may enjoy these podcast from Hi Phi Nation: Hackademics I and Hackademics II.
- If you would like to explore the idea of significance tests a little more, this Khan Academy lesson is a nice distillation—The idea of significance tests.
- Conference Diversity Distribution Calculator. "This calculator models the probability distribution for male/female speaker balance assuming random selection, which roughly follows a binomial distribution. It was inspired by the work of Dave Wilkinson and Paul Battley, who made similar models and found that the likelihood of an unbiased selection process yielding a line-up with no women at all is far lower than intuition might suggest, and – depending on the numbers you plug in – can often be far lower than the likelihood of their over-representation. That is to say: in an unbiased selection, you’re significantly more likely to see more than the expected number of women than none at all."
Knowledge Base
Everyone comes to this adventure with a different background. So this section is designed to be a menu of sorts. If you already know a topic well, you can skip the relevant material. Just answer the questions below, and section(s) will disappear accordingly. That being said, if a section doesn't disappear, you should do it. Any time you save skipping a topic, however, should be spent working on your final project or reading ahead in either Weapons of Math Destruction or How Not to Be Wrong. FYI, we will be reading all of Weapons of Math Destruction and all but parts III and V of How Not to Be Wrong.
All of that being said, let's see if we can pare things down.
Are you proficient with QnA Markup?
You've gained roughly 30 minutes by dropping a video introduction to QnA Markup. FWIW, you're going to be asked to create an interview in QnA Markup at the end of this Level. If you find yourself with questions, change this answer to unhide the QnA introduction.
Do you have a good text editor? I'm not asking about a word processor, there's a difference.
You've gained roughly 10 minutes by dropping a section on installing a text editor.
Do you have a GitHub account, and do you know how to use it?
You've gained roughly 20 minutes by dropping a GitHub exercise that walks you through creating a reop and making a pull request et al.
Your Mission: Machine Learning In Production with Google Sheets
~8-15 Minutes
This discussion build on the school closing example introduced back in level 4 when we talked about success metrics. If you're unfamiliar with Google Sheets, you can learn more on the Google Sheets website.
Your Final Project
2+ Hours
We're entering the home stretch. Remember to ask questions in Teams if you're stuck, and when we next meet, we'll do rounds—checking in with everyone to see where they are at. See The Final Project Rubric.
Self-Reflection and Logging Your Work
~20 min
As we do at the end of every level, we ask that you take a few minutes to reflect on how things are going. I've also included a set of reading questions to queue things up for our synchronous discussion. Your answers will be shared with me and it will let me know that I can look for any project work you may have posted. That being said, you've almost completed Level 9. Tell me how it's going by completing the form linked below.
Synchronous Meet Up, AKA our Class Time
October 30, 2023 @ 4pm Eastern
If you're an enrolled student, we'll be meeting in Sargent Hall Room 305 on Monday October 31st at 4pm. Our remote backup is to meet via Zoom at this link. You should have received the password from me earlier. If you don't have the password, and you are a registered student, DM me on Teams, and I can give you the password. If you're not an enrolled student, I'm afraid you can't join us.
We will use this time to: (1) troubleshoot any issues folks might have had working through the your mission; and (2) discuss the readings.
† Time estimates are just that—estimates. The assumptions used to calculate reading time are as follows: 48 pages is assumed to take roughly an hour to read. When working with non paginated texts, it is assumed that a page is roughly equal to 250 words. Videos assume both 2X and 1X viewing. Estimates for coding are based on past experience. Each level should include about 6 hours and 40 min of work.