Warning: Outdated Content
This lab is from a previous version of CS 125. Click here to access the latest version.
Trace Points
In this lab we focus our attention on the art and craft—the techne—of programming. We’ll do some code reading and talk style; continue developing our algorithmic chops; and get practice with variable assignment, loops, and counters. We’ve also set aside some time to talk MP2.
1. Written Exercises (40 Minutes)
We begin with a set of code reading and writing exercises.
Complete this part of the lab in pairs using Google Docs. Create a copy of our document template, and then edit it to record you and your partner’s answers. Note that you must open this document using your @illinois.edu Google Apps account. We will not grant access to non-Illinois users 1.
Have a course staff member check your answers as you go. When you are done, move on to the next section.
2. Simple Web Scraping (40 Minutes)
For the second part of this lab we’re going to do a quick code writing exercise. Our goal is to show how easy it is to fetch and process content from the web—the most impressive library of text the world has ever created.
2.1. Creating a New IntelliJ Project
In this lab we’re going to show you how to create a new IntelliJ project, rather than using one that we’ve already set up for you. Follow the screencast above to learn how to do that—or just look it up online. There are plenty of good instructions out there.
2.2. Getting to the main
Point
Working with our Lab3 project, create a new class. You can call it anything you like. Make sure that it has a main method—we’ll be using it throughout the lab. IntelliJ has a way to create a command line app that will add this function for you, or you can add it later. Refer to the screencast above for guidance about how to create a new class in IntelliJ. And make sure to add version control to your project as well, also as shown in the screencast above. Ask a neighbor or a staff member for help if you need it.
2.3. A Bit of Help
Our goal is to give you the chance to process web content as strings. You could very easily find out how to do this online, probably discovering the same advice that our approach is based on.
But, to speed the plow a bit, we’re going to provide you with a helper function.
Cut and paste the following function into the class you created above.
Don’t put it inside the main method—make it a separate function.
Also note that the import
statements have to go at the top of your file, not
at the same level as the new function.
2.3.1. Nothing is magic
You don’t need to understand this code—just be able to use it. But here’s an explanation for the curious 2:
-
Lines 1–3: import the parts of the Java standard library that we need.
-
Lines 12–17: we allocate a new Scanner object (Line 12) and then initialize it (Line 14).
-
Line 14: URL.openStream can throw several kinds of exceptions. Normally we’d let the caller handle them, but in this case we’ll suppress them and return an empty string on error (Line 16).
-
Line 18: now we have to convert the contents to a string. This post explains a bit of the "\\A" part, which I don’t fully understand but it seems to work.
-
19–20: Then we close our urlScanner to avoid a resource leak and return the string contents.
2.4. Word Counting
Once you have our urlToString
function integrated into your code, test it out
using System.out.println
.
Here are some URLs that may be interesting to try.
Note that they all return raw text, rather than the
HTML that you are using to seeing online.
But you should also experiment with some HTML pages, like
this one.
Now, for each of the pages above, compute a word count. Your class should compute the total number of words on the page.
We are intentionally not giving you a lot of help with this part of the lab. But don’t get discouraged! Look around for help online, ask your neighbor, and be sure to ask your TA and doyen for help as well.
2.4.1. Counting One Word
Now modify your code above so that it looks for and counts occurrences of a specific word, which you can define as a constant in your code. For example, how many times does the word "Prince" appear in Hamlet? Can you make your new function case-insensitive, so that "Prince" and "prince" are counted as the same word?
2.4.2. Challenge: Unique Word Counting
If you get your word counting done with time to spare, try changing it so it counts the number of unique words in each file. For example, the number of words in "this is a string is a string" is 7, but the number of unique words is 4.
Completing this part of the lab will probably require you explore advanced Java data structures that you will not see for a while in this course. But give it a shot if you get here with time to spare.
3. Help with MP2 (20 Minutes)
Use any remaining time in your lab section to get help with MP2. If you are done or making good progress, please help others—but help them learn, don’t just give them the answers. And if you are behind, please reach out the course staff for help.
4. Before You Leave
Don’t leave lab until:
-
You’ve completed the entire handout
-
You’ve finished the web scraping exercise
-
You’ve pushed your work to GitHub and showed a TA or CA
-
You’ve considered sticking around for a few minutes to help others—either with the lab or with MP2