See Spot; See Spot Run: Using ML to Spot Fact Patterns
~23-45 Minutes. Protip: You can watch YouTube videos at more than 1X speed.†
If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.
- Spot builds upon data from the Learned Hands online game, a partnership between the LIT Lab and Stanford's Legal Design Lab. Learned Hands aims to crowdsource the labeling of laypeople's legal questions for the training of machine learning (ML) classifiers/issue spotters. Currently, this labeling is limited to publicly available historic questions from the r/legaladvice forum on Reddit. See Stanford and Suffolk Create Game to Help Drive Access to Justice.
- Legal Issues Taxonomy (LIST). This taxonomy is what Learned Hands uses to label training data for Spot. It's worth noting that adoption of LIST, formerly NSMIv2, is one of the primary goals of Spot. As you may have gleaned from our discussion of data standards, it can be hard to get folks to adopt a standard. It's a chicken and egg problem. Folks want to use the standard that everyone else is using because a standard's value is a function of its community. Unfortunately, when there is no pre-existing community, it can be hard to get the ball rolling. Spot is an attempt to do this. We're building a shiny new AI tool that folks want to use. It just so happens that you have to label things in LIST for it to be useful. It's a Trojan Horse. ;)
~ 1 Hour 45 Minutes
Using the Spot API
For those of you not working in Pythonanywhere, here is the notebook: spotAPI.ipynb. Vist the Spot website to create an account and get your API token. If you want to jump straight to the documentation, here's the link.
We are working with the notebook file training.ipynb (pre-loaded for those of you using Pythonanywhere).
~1 Hour 50 Minutes
Clean and feed the following data (i.e., challenge_calls.csv and challenge_people.csv ) into the best-performing classifier you trained on the Dewey, Cheetham, and Howe data. That is, use your best model to predict for this data if a call is or isn't a take. Then produce a list of those calls that are takes. You will be asked to share the call IDs for these calls as part of your work log below (e.g., [175, 234, 327]).
NOTE: If you are using the class's Pythonanywhere accounts, the two csv files mentioned above should already be in the same directory as your notebooks. Also, I would like to remind you that you can ask for help on our Slack channel if you're not sure what your next step should be. This mission asks you to tie together a lot of prior works and make some connections. So it's understandable if you have questions.
Update: Based on several conversations I’ve had this week, I want to provide you all with this notebook (Level 9 Notebook.ipynb) to help you through this mission. If you take this notebook, read through it, run it, and turn in its output, that will meet this level's expectations. Of course, my hope is that you will do more than this, but it’s important for you to know that you don’t have to unless you want to exceed expectations. So I’m attaching a stretch goal to incentivize a little more than meeting expectations. Stretch Goal: create a model that gets an F1 score in excess of 0.7 on the challenge data. Good luck!
Self-Reflection and Logging Your Work
As we do at the end of every level, we ask that you take a few minutes to reflect on how things are going. I've also included a set of reading questions to queue things up for our synchronous discussion. Your answers will be shared with me and it will let me know that I can look for any project work you may have posted. That being said, you've almost completed Level 9. Tell me how it's going by completing the form linked below.
† Time estimates are just that—estimates. The assumptions used to calculate reading time are as follows: 48 pages is assumed to take roughly an hour to read. When working with non paginated texts, it is assumed that a page is roughly equal to 250 words. Videos assume both 2X and 1X viewing. Estimates for coding are based on past experience. Each level should include about 6 hours and 40 min of work.
Synchronous Meet Up, AKA our Class Time
1 hour | October 26, 2020 @ 4pm Eastern
If you're an enrolled student, we'll be meeting at this link on Monday October 26th at 4pm via Zoom. If you don't have the password, and you are a registered student, DM me on Slack, and I can give you the password. If you're not an enrolled student, I'm afraid you can't join us.
We will use this time to: (1) troubleshoot any issues folks might have had working through the your mission; and (2) discuss the readings.