Coding the Law: Data Standards and Jupyter Notebooks (Level 6)

Data Standards
~2-7 min. Protip: You can watch YouTube videos at more than 1X speed.

Source: Standards from xkcd.

Optional. If all of this talk about data makes you scream, "show me the data." The following is for you. I have collected several places you might want to look to find curated data sets to give you an idea of what data looks like when they're collected in nice structured forms.

Measures for Justice. An attempt to collect state-level criminal justice data.
Data.gov. The federal government's open data portal.
USA Facts. A private initiative to address the gaps in access to data needed by governments to make policy decisions across the US.
Google's Data Set Search. A tool for searching across a number of publicly available data sets.

Readings
~ 1 Hour 17 Minutes

Weapons of Math Destruction: Sweating Bullets (Chapter 7) (18 pages)
How Not to Be Wrong: Reductio Ad Unlikely (chapter 8) and The International Journal of Haruspicy (chapter 9) (31 pages)
Speaking the Same Language: Data Standards and Disruptive Technologies in the Administration of Justice. In this paper Erika Rickard and I flesh out the arguments from the above video to make the case that courts should work to adopt open data standards. (~28 pages)
Optional: If you want to learn more about the replication of scientific results and what has come to be known as the replication crisis, you may enjoy these podcast from Hi Phi Nation: Hackademics I and Hackademics II.

What the Heck is Word2Vec? Neural Nets for Lawyers
11-33 min. Protip: You can watch YouTube videos at more than 1X speed.^†

FWIW, preparing instructional material is an exercise in compression, and it's not lossless. The hope is that you now have a very high-level sense of how things work. For example, I glossed over how the activation function behaves with regard to the word2vec "hidden"/projection layer. Spoiler: it's not a sigmoid! Actually, there is no activation function. We just pass on the weights. That being said, I didn't explain weights very deeply. So again, as it says in the title—oversimplification. ;)

Source: Autodesk.

Optional Media. If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.

Parallel Search. You'll read more about this tool in the reading. Why not take it for a spin first?
Bring your own doodles linear regression
But what is a Neural Network? | Deep learning, chapter 1. This is the first in four videos from 3Blue1Brown on Neural Networks. If you don't know 3Blue1Brown, you're in for a treat. They make great visual explainers.
wevi: Word Embedding Visual Inspector. This is a fun tool that alows you to train your own word2vec model, kind of like the tensorflow playground above. For more on how it works, you can check out its internal documentation or here from it's author here: Word Embedding Explained and Visualized - word2vec and wevi.
Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. This article explains the work behind the Datasaurus.
Tensor Flow Playground. This tool lets you create and play with your own neural network.
And if you want to learn more about activation functions, check out Understanding Activation Functions and Hidden Layers in Neural Networks.

See Spot; See Spot Run: Using ML to Spot Fact Patterns
~15-45 Minutes. Protip: You can watch YouTube videos at more than 1X speed.^†

Trojan Horse, Canakkale, Turkey. Photo by Peter Reed.

Optional Media. If you want to learn more about some of the topics discussed in the video above, and you have some free time, you might enjoy the following.

Spot builds upon data from the Learned Hands online game, a partnership between the LIT Lab and Stanford's Legal Design Lab. Learned Hands aims to crowdsource the labeling of laypeople's legal questions for the training of machine learning (ML) classifiers/issue spotters. Currently, this labeling is limited to publicly available historic questions from the r/legaladvice forum on Reddit. See Stanford and Suffolk Create Game to Help Drive Access to Justice.
Legal Issues Taxonomy (LIST). This taxonomy is what Learned Hands uses to label training data for Spot. It's worth noting that adoption of LIST, formerly NSMIv2, is one of the primary goals of Spot. As you may have gleaned from our discussion of data standards, it can be hard to get folks to adopt a standard. It's a chicken and egg problem. Folks want to use the standard that everyone else is using because a standard's value is a function of its community. Unfortunately, when there is no pre-existing community, it can be hard to get the ball rolling. Spot is an attempt to do this. We're building a shiny new AI tool that folks want to use. It just so happens that you have to label things in LIST for it to be useful. It's a Trojan Horse. ;)

Get Ready to Use the Spot API
~5 Minutes

Vist the Spot website and create an account and get your API token. FWIW, if you want to read ahead, you can skimthe documentation at this link.

Self-Reflection and Logging Your Work
~20 min

As we do at the end of every level, we ask that you take a few minutes to reflect on how things are going. I've also included a set of reading questions to queue things up for our synchronous discussion. Your answers will be shared with me and it will let me know that I can look for any project work you may have posted. That being said, you've almost completed Level 6. Tell me how it's going by completing the form linked below.

Log and reflect on your work

Synchronous Meet Up, AKA our Class Time
October 10, 2023 @ 4pm Eastern

If you're an enrolled student, we'll be meeting in Sargent Hall Room 305 on Tuesday October 11th at 4pm. Our remote backup is to meet via Zoom at this link. You should have received the password from me earlier. If you don't have the password, and you are a registered student, DM me on Teams, and I can give you the password. If you're not an enrolled student, I'm afraid you can't join us.

We will use this time to: (1) troubleshoot any issues folks might have had working through the knowledge base; (2) look at and talk about your mission; and (3) discuss the readings.

Previous Level

Next Level

^† Time estimates are just that—estimates. The assumptions used to calculate reading time are as follows: 48 pages is assumed to take roughly an hour to read. When working with non paginated texts, it is assumed that a page is roughly equal to 250 words. Videos assume both 3X and 1X viewing. Estimates for coding are based on past experience. Each level should include about 6 hours and 40 min of work.

Coding the Law Suffolk Law School: Fall 2023 by @Colarusso

Data Standards ~2-7 min. Protip: You can watch YouTube videos at more than 1X speed.

Readings ~ 1 Hour 17 Minutes

What the Heck is Word2Vec? Neural Nets for Lawyers 11-33 min. Protip: You can watch YouTube videos at more than 1X speed.†

See Spot; See Spot Run: Using ML to Spot Fact Patterns ~15-45 Minutes. Protip: You can watch YouTube videos at more than 1X speed.†

Get Ready to Use the Spot API ~5 Minutes

Self-Reflection and Logging Your Work ~20 min

Synchronous Meet Up, AKA our Class Time October 10, 2023 @ 4pm Eastern

Coding the Law
Suffolk Law School: Fall 2023
by @Colarusso