## Your Mission (Continued)

~1 Hour 20 Minutes

Build on our work from class to formalize your play from the last level. Once you've merged the data from Dewey, Cheetham, and Howe, created some potentially useful features, and turned everything into numbers, upload a csv of your data to GitHub with the name *dewey-cheetham-howe.csv*. See

`https://[your username].github.io/ctl/dewey-cheetham-howe.csv`

##
What the Heck is Word2Vec? Neural Nets for Lawyers

17-33 min. Protip: You can watch YouTube videos at more than 1X speed.^{†}

FWIW, preparing instructional material is an exercise in compression, and it's not lossless. The hope is that you now have a very high-level sense of how things work. For example, I glossed over how the activation function behaves with regard to the word2vec "hidden"/projection layer. Spoiler: it's not a sigmoid! Actually, there is no activation function. We just pass on the weights. That being said, I didn't explain weights very deeply. So again, as it says in the titleâ€”oversimplification. ;)

**Optional Media.**
If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.

- Parallel Search. You'll read more about this tool in the reading. Why not take it for a spin first?
- Bring your own doodles linear regression
- But what is a Neural Network? | Deep learning, chapter 1. This is the first in four videos from 3Blue1Brown on Neural Networks. If you don't know 3Blue1Brown, you're in for a treat. They make great visual explainers.
- wevi: Word Embedding Visual Inspector. This is a fun tool that alows you to train your own word2vec model, kind of like the tensorflow playground above. For more on how it works, you can check out its internal documentation or here from it's author here: Word Embedding Explained and Visualized - word2vec and wevi.
- Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. This article explains the work behind the Datasaurus.
- Tensor Flow Playground. This tool lets you create and play with your own neural network.
- And if you want to learn more about activation functions, check out Understanding Activation Functions and Hidden Layers in Neural Networks.

## Readings

~ 1 Hour and 10 Minutes

- Weapons of Math Destruction:
*Collateral Damage*(Chapter 8) (20 pages) - How Not to Be Wrong:
*Are You There, God? It's Me, Bayesian Inference*(chapter 10) and*The Triumph of Mediocrity*(chapter 14) (43 pages) - The Machine Learning Technology Behind Parallel Search. (~6 pages)

### Training Your Algos: Linear Regressions

7-14 Minutes.

Note: none of your missions will ask you to perform a linear regression. So don't worry too much about the details discussed above. Given how much we've talked about regressions, I figured we should build at least one together as a class.

We are working with the notebook file training.ipynb (pre-loaded for those of you using Pythonanywhere).

**Optional:** If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following: Guess the Correlation (a video game where you guess the R-squared).

### Training Your Algos: Binary Classifiers

13-26 Minutes.

Again, we are working with the notebook named training.ipynb (pre-loaded for those of you using Pythonanywhere).

**Optional:** If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.

- Understanding ROC curves. This is a really great interactive that gives you a feel for what a ROC Curve actually is. You can see it used in this larger explainer: ROC curves and Area Under the Curve explained (video).
- How to Use ROC Curves and Precision-Recall Curves for Classification in Python.

## Your Mission (Continued)

~1 Hour

Take the data you prepared above (i.e., `https://[your username].github.io/ctl/dewey-cheetham-howe.csv`

) and use it to train the classifiers found in training.ipynb under the *Classifiers Section*. Remember I've only loaded libraries for the Python 3.5 kernel. So if you make a new notebook, be sure it's using 3.5.

**Stretch Goal:** Add a new classification algo from scikit-learn to your notebook. Take a screen shot of the evaluation screen for this algo, and upload it to GitHub with the name *new_algo*. E.g.,

`https://[your username].github.io/ctl/new_algo.png`

Note you should follow the pattern of the algos shown above. That is, you can just copy one of the example cells, swap out the first two lines and edit variable names accordingly. You might find some inspiration here.

## Your Final Project

Enrolled students will be presenting on their final project in one week. Take whatever time you have left to work on your project, even if it's just planning or skills acquisition (e.g., working through the optional docassemble training from level 3). See The Final Project Rubric.

## Self-Reflection and Logging Your Work

~20 min ^{}

As we do at the end of every level, we ask that you take a few minutes to reflect on how things are going. I've also included a set of reading questions to queue things up for our synchronous discussion. Your answers will be shared with me and it will let me know that I can look for any project work you may have posted. That being said, you've almost completed Level 7. Tell me how it's going by completing the form linked below.

## Synchronous Meet Up, AKA our Class Time

1 Hour and 30 Minutes | October 13, 2020 @ 4pm Eastern

If you're an enrolled student, we'll be meeting at this link on **TUESDAY** October 13th at 4pm via Zoom. If you don't have the password, and you are a registered student, DM me on Slack, and I can give you the password. If you're not an enrolled student, I'm afraid you can't join us.

We will use this time to: (1) troubleshoot any issues folks might have had working through the your mission; and (2) discuss the readings.

^{†}Time estimates are just thatâ€”estimates. The assumptions used to calculate reading time are as follows: 48 pages is assumed to take roughly an hour to read. When working with non paginated texts, it is assumed that a page is roughly equal to 250 words. Videos assume both 2X and 1X viewing. Estimates for coding are based on past experience. Each level should include about 6 hours and 40 min of work.