Data Modeling

We’ve frequently pointed out that Java classes can help you work with data. By creating your own Java types you can model real-world entities, allowing you to both create new data and work with existing data sets. But we haven’t given you any change to actually do that yet—until now.

Today’s lab gives you a chance to work with data in the way that you are likely to find it in the wild—stored as text in a file. You’ll write code to load the data into a Java class that you define, and write some methods to process it the same way you would if you were performing an actual data analysis or investigation. Finally we’ve set aside time for you to get started on MP3 which is due next Friday!

1. CSV (60 Minutes Total)

Both today’s lab and MP3 introduce you to different ways of representing Java objects as text—one instance of a broader set of techniques known as serialization. MP3 introduces you to JSON, a powerful way to convert Java objects 1 to strings while preserving much of the object’s structure.

But in lab today you’ll work with the CSV (comma-separated value) format. While it is far more limited than JSON, it is fairly ubiquitous, and many of the interesting data sets that you can find online you can access in CSV format.

What is the CSV format? Imagine that I have data about pets stored in our usual Pet class:

public class Pet {
  public String name;
  public int age;
  public String type;
  public Pet(setName, setAge, setType) {
    name = setName;
    age = setAge;
    type = setType;
  }
}

Now let’s say that I want to save that data to a file. Maybe I want to make sure it is saved so that the next time my program runs I still have it. Or maybe I need to send it to a friend, or want to work with it using a spreadsheet tool like Google Sheets. Or maybe I want to do some work in another programming language—like Python, or JavaScript, or Go. Regardless: I need some way to save my data about pets so that I can read it back in later.

CSV is one way to do that. A CSV file consists of a series of records, one on each line. Each record contains multiple fields, separated by commas. To save a Pet record I need to convert each of its fields to a String and write them to a file as a single line, with the fields separated by commas.

So, for example, imagine that I had the following three Pets in my program:

Pet chuchu = new Pet("Chuchu", 14, "dog");
Pet xyz = new Pet("Xyz", 4, "cat");
Pet balou = new Pet("Balou", 15, "dog");

Converted to CSV format, those three objects could look like this:

Chuchu,14,dog
Xyz,4,cat
Balou,15,dog

However, they could also look like this:

dog,Chuchu,14
cat,Xyz,4
dog,Balou,15

Same fields, different order—confusing! As a result, we usually add a header specifying the name for each field, to eliminate ambiguity and make sure that we don’t forget how we saved our data:

Name,Age,Type
Chuchu,14,dog
Xyz,4,cat
Balou,15,dog

1.1. Data Modeling Using CSV Data

So now you’ve seen how CSV files are generated using a simple Java class. But how would we do the reverse? Imagine we had a file containing some CSV data and we wanted to load and work with it in Java. How would we do that?

Imagine that we have a CSV of geocache locations as part of a game we’re playing. Each location is worth a certain amount of points. Here’s the CSV header and a few example records:

Latitude,Longitude,Points
40.482979,-88.993390,100
40.197184,-88.366315,-100

To design a Java class to model this data requires making a few decisions.

  • What should we call the class? Usually this requires some information about what it is being used for.

  • What should the names of the instance variables be? Just using the same names as the header is a pretty good convention, although we’ll want to use camel case to avoid variables that start with an uppercase letter.

  • What should the types of the instance variables be? Here we want to examine a bit of the data itself.

Here’s an example class based on the CSV shown above:

public class GeocacheLocation {
  private double latitude;
  private double longitude;
  private int points;
  // Getters and setters not shown
}

When modeling data these decisions are yours to make. But adhering to a few simple conventions will help make your code easier to read and maintain.

1.2. Converting Records to Objects

Next we have to figure out how, given a line of text like:

40.482979,-88.993390,100

We end up with a GeocacheLocation object with latitude == 40.482979, longitude == -88.993390, and points == 100. There are libraries that can do this for you—but it’s not hard to do yourself. Remember String.split and String.trim? Those functions come in handy here 2. And as an additional reminder, Integer.valueOf and Double.valueOf will convert a trimmed String into an int or double, respectively. That’s a good bit of what you need to know to get started!

1.3. Loading Entire Files

Once you can convert a single line containing a record to an object of the appropriate type it’s easy to extend this idea to load an entire file containing multiple records. You end up with an array of objects. So if we loaded:

40.482979,-88.993390,100
40.197184,-88.366315,-100

We’d end up with a GeocacheLocation array of size 2.

1.4. Data Processing

Once we have an array of GeocacheLocation objects we can work with them like any other Java objects. For example, if we knew current location we could use it to determine which GeocacheLocation object was closest to us. Or we could find the one worth the most points. At this point we’re just writing Java code: the CSV parts are behind us!

1.5. Practice With CSVs and Data Processing

Today’s lab homework gives you practice working with data in the same way as described above. You don’t actually get to read the data from the file—but you get to do everything else. This is great practice with object design, and some review of String processing and basic algorithm review as we prepare to begin talking about algorithms and data structures next week.

Good luck, and, as always, have fun! Hopefully this will help demystify the process of working with data in Java.

2. Starting MP3 (Remaining Time)

We’ve set aside the remainder of lab for you to get started on MP3. Definitely try to get to the initial Cognitive Services API setup done. It’s a bit tricky and if you do it in lab we can help you out.

3. Before You Leave

Don’t leave lab until:

  1. You’ve completed our in-lab testing homework problems.

  2. You’ve made some more progress on MP3…​

  3. And so has everyone else in your lab!

If you need more help completing the tasks above please come to office hours or post on the forum.

CS 125 is now CS 124

This site is no longer maintained, may contain incorrect information, and may not function properly.


Created 10/24/2021
Updated 10/24/2021
Commit a44ff35 // History // View
Built 10/24/2021 @ 21:29 EDT