DIY Toolkit: How to Plan a Database

database-scrabble
source, Flickr: Christophe BENOIT

 

“Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.
— E.F. Codd,   A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol. 13 Issue 6, June 1970

 

Data Modeling and the relational model

What does it mean to model data? When we model data, we think about how information will be organized, which means that we also think about how data will be split up and stored.

In this lesson, we are going to create a database on paper. It will contain information about students at an imaginary high school. We’ll be inventing characters for the imaginary students who attend.

 

What to keep in mind as you read this toolkit
For a reader trained in the humanities… adaptions to the habitual reading style have to be made to get something useful out of [a technical article]…language, here…follows a convention consisting of laying out a space of thinking through a series of very precise definitions, which often attribute quite specific significations to words taken from everyday language. Miss one of these definitions and the whole pyramid crumbles.
— Bernhard Rieder,  The Politics of Systems : Thoughts on Software, Power, and Digital Method 

 

Key vocabulary

  • Database – An organized collection of data
  • Table – A collection of related data within a database, which is usually but not always digital.  Tables contains rows and columns, and can be imagined as a grid. Each column (vertical) will contain the same kind of data, which is roughly analogous to a category of information. Each row (horizontal) should contain information belonging to a single record
  • Relational model – Derived from the mathematical use of the term relation as used in set theory, contrasts with graph and network models. The relational model allows users to deal, not with relations which are domain-ordered, but with relationships which are their domain-unordered counterparts
  • Relational database – A term dating to 1970. Relational databases  follow the relational model, and contain tables which are related to each other
  • RDBMS (relational database management system) – A database is considered relational if it follows a set of thirteen rules, known as Codd’s Twelve Commandments
  • Field – A single component of a database record
  • Tuples – Ordered lists of elements containing values. For simplicity, think or tuples as being similar to table rows. Rows in one table may relate to rows in one or more other tables. The power of this model lies in the ability to pull information from multiple tables
  • Queries –  First-order logic statements. An example of a simple query might be “SELECT size, color, designer FROM wardrobe WHERE star = ‘Mary Foster'”;. This imaginary query for a film studio’s costume department will select from the database table “wardrobe” all the field data for size, color and designer from rows which solely concern the actor Mary Foster
  • Field Name – The name of a field
  • Fault ToleranceThe ability to continue critical operations to some degree in the face of breakdowns. An example of fault tolerance is an emergency brake in a car, which allows a car to stop even if the main brakes stop functioning
  • RobustnessThe degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions

Background: How do you make tables?

A table is a collection of related data held in a structured format within a database. When data is collected and stored, it is searchable, either mechanically or manually.

Tables are made up of rows and columns. Each column (vertical) will contain the same kind of data, which is roughly analogous to a category of information. Each row (horizontal) should contain information belonging to a single record. (Fields are restricted by data type as well; we will get to that later on, after we have performed the initial analysis.)

When data is collected and stored, it is searchable, meaning we’ll ultimately be able to use it for other purposes such as generating reports, charting trends, updating record sets, making predictions, and displaying information that can be diced in endless different ways.

KEY QUESTION: Even though you can run an automatic search on a database, it’s possible to search it manually (by hand). How would you go about searching a database by hand? What advantages or drawbacks does a manual search offer? What are some real-life examples for hunting for data manually?

Tables help us organize data in a way that makes it easy to retrieve and analyze. It’s kind of like how a person might choose to store their clothes in a dresser versus a giant pile on the floor. (in this metaphor, the clothes=information and the dresser=table)

Note that when putting your clothes into dresser drawers that:

  1. the organization is up to you (want to store socks and shirts together? Go ahead!)
  2. a tidy system is easier to search when you’re in hurry, and leaves behind less of a mess.

17106882372_41f77cb702_z

By categorizing and storing the clothing into set places, you can more easily retrieve the items you want when you need them. Tables allow us to do something similar with abstract items like information. Think of tables as collections of related data. Later on, we’ll let the tables interact (relate) with each other — but for now let’s just take small steps.

KEY POINT: some databases have just one table. However multiple tables are frequently preferable to having one large table that resembles a spreadsheet. We’ll go into why that is later.

 

ACTIVITY – Creating a Student Database for “Amazing High”


We are going to make up a story about Amazing High (AH.) We want to know what happens on the day report cards come out.

Step 1: Generating Your Data

Take a paper and a pencil. Make up a name for your character, and give them a personality.

  • What are their interests?
  • What grade are they in?
  • Do they get good/average/poor grades?
  • What is their favorite class?
  • Least favorite class?
  • What class are they taking just because they have to?

After five minutes, tell the students to look at their character on paper and set the details in their minds. Then let everybody hand their forms to the person on the left.

Take a look at the paper you now have. (If there is any info missing, get your classmate to help fill the gaps.)

 

Questions for Discussion:

  • How does the character your classmate made compare to the one you created?
  • Do you think this character be friends with your character? Why or why not?

 

Step 2: Building a Table

Now we will use the information you created about your characters to build a table.

The tables we are going to draw look like a grid when empty. Start by assigning your table a name. This name should be styled as singular: (ex. attendee or manufacturer, not attendees or manufacturers.) Each of the polygon of the grid is a “field.” Fields are distinct elements (data structures) which can contain different kinds of entries.

 

idnumber firstname

lastname

areacode phonenumber
0896771 Calvin Ruiz 424 5551111
0032910 Pooja Harris-Burke 212 5552222

Fields in the first row of this example contain field names. Subsequent rows contain field data.

 

Using the paper you currently have, start jotting down field names that reflect your character. Field names should not contain any spaces, though you can use underscores (_). They should also refer to characteristics that apply to all students.

For Example, fieldname:class_standing NOT fieldname:sophomore.

We’ll use some field names right away, and others can be saved for the next round. If you feel stuck, go back to the original questions you used to generate your data in the first place.

 

Step 3: Checking if your field names make sense for your data set

Have students regroup and share what they have come up with. Ask students whether the representation a student has come up with is robust enough to accommodate both the character they have designed, plus the character their classmate designed. If the representation is not able to accommodate both characters, think about ways to adjust.

Reinforce to the students that we are essentially painting a portrait of a real-world scenario. If they are feeling a bit lost, hint that a key way to think about the table is in terms of reusability. Could the field names they created be used for both their own character and the ones their classmates created?

TEACHER’S NOTE: Common issues you’ll encounter will be: combining different pieces of data into one field; multiple values in a single field; creation of unnecessary fields (bloat).

You will sometimes see that students have modeled data in different ways, which all work.

When this happens, you will have a great opportunity to discuss which approach is more robust and/or elegant. Elegant, when used in computing is a specific term, and it always refers to simplicity, clarity, and functionality that includes fault tolerance (known as “graceful degradation”)

Step 4: Entering Your Data

OK, let’s see what we’ve got. Let’s enter in some data using the structure we’ve got so far.

  • Take each field name and create a column header with it. In the rows beneath, have students enter the character info for the entire class.
  • Ask the group: “Does it seem to anyone that having one large table is unwieldy in a way? Do we find ourselves entering some of the same information over and over again? (ex: class_standing)? Wouldn’t it be nicer to have a shortcut to do that?”


In future toolkits, we will be: streamlining our schema; creating related tables to address some of the concerns your class may have taken note of; and learning how to normalize, which is a way to increase data integrity, speed up the database, and enforce consistency.


Further Learning

The data modeling work we’ve been doing so far is leading us toward the creation of a database schema. Remember that computerized databases as we are discussing here are made up of one or more tables. A schema is the underlying organization of an entire database including the relations between its tables. Think of a schema like a skeleton, with relations being tendons or connective tissue, and data as the overlying flesh.

 

Listen Now Podcast