On this page:
✔️ Motivation
✔️ Background Info
✔️ Your Task
✔️ Requirements
✔️ Handing in
✔️ Grade Breakdown
Motivation (Why are we doing this?)
The goal of this (extended) wet toe is to provide you an opportunity to practice with using sets and maps
with real data. Namely, you’ll be:
✔️ Utilizing classes from the STL (Standard Template Library) such as fstream, set and map
✔️ Becoming familiar with use cases of set and map, as well as their operations
Background Info
Sets
The reason for being called a “set” comes directly from set theory, a branch of mathematics, in which a set is a collection of distinct objects. In other words, each element that could be in the set is either in the set or not in the set, there is no “amount” associated with a given element. There are a myriad of operations that you can perform on a set. Look at the set reference page to familiarize yourself with the basic operations.
#include <set>
#include <string>
int main() {
std::set<std::string> animals; // creating a set of strings
myset.insert("cat");
myset.insert("dog");
myset.insert("horse");
// ...
}
Maps
Maps are similar to sets, in that each contains a number of unique elements that are in order. The key difference between sets and maps in general is that sets store elements all by themselves, whereas maps store key-value pairs. A key-value pair is a pair of two elements where the first element, the key, is used to index the map, and the second element, the value, is what is stored/returned. Similar to the set reference page, there exists a map reference page for you to explore as well.
Here is an example of maps in action:
#include <iostream>
#include <map>
int main() {
std::map<std::string, int> mymap;
mymap["dog"] = 7;
mymap.emplace("cat", 4); // This is the same as mymap["cat"] = 4
mymap.insert({"fish", 11}); // This is the same as mymap["fish"] = 11
mymap["cat"]++;
std::cout << mymap["dog"] << std::endl; // Prints 7
std::cout << mymap["cat"] << std::endl; // Prints 5
std::cout << mymap["fish"] << std::endl; // Prints 11
}
Note that when you use
emplace
orinsert
with a key that is already in the map, the value will not be replaced.
Reading & Parsing Data Files
By now, you’re all familiar with reading and writing to files. Now that we’ll be working with data files formatted as CSVs, there’s an additional step that’s required: parsing.
The steps required for handling an input file, as you’ve experienced from previous deep dives:
- Create an instance of ifstream.
- Open the file. (Check for failure to open.)
- Read from the file.
- Close the file.
Parsing a file would happen in step #3.
When working with a real-world datasets, most of them are stored as CSVs: comma-separated values. A CSV has “columns” separated by commas and “rows” separated by newline characters. For example, if we downloaded the “Umbrella topics” from the Assignment/Topic Matrix our columns would be in the following format:
Number,Title,Non-Negotiable,Notes,Assignments,HighestPossibleLevel
To handle the commas, we can use an overloaded version of std::getline()
. If you read the documentation for the function, you’ll see that there is a version of the function that accepts a delimiter. By passing an argument for the delimeter, we specify how we want the string to be split up. Every call to std::getline()
will read the data up to the next symbol set as the delimiter. You can place this in a loop to parse a delimited string:
#include <iostream>
#include <fstream>
#include <sstream>
int main() {
std::string line;
std::string entry;
//Step 1: Create an instance of ifstream.
std::ifstream table;
//Step 2: Open the file.
table.open("tabledata");
//Step 2 (part 2): (Check for failure to open.)
if(table.fail()){
std::cerr << "Can't open tabledata\n";
return 1;
}
//Step 3: Read from the file. Specifically get a line of data from table, store in 'line'
while(std::getline(table, line)) {
std::stringstream streamline(line);
// Step 3 (parsing): Loop through each "column" in 'line' and store it into 'entry'
while(std::getline(streamline, entry, ',')) {
std::cout << entry << std::endl;
}
}
//Step 4: Close the file
table.close();
}
To see an example of CSV reading in action, check out this repl.it where I number each row and column for each entry of a small CSV.
Your Task
You will be working with a real-world dataset. Find a dataset you’re interested in and download its CSV. Below are sources of datasets, but you can use anything as long as it’s a CSV:
- Identify what the data is
- Identify how the data is stored
- Come up with 1 question you can ask about it that can be answered by using a set
- Come up with 1 question you can ask about it that can be answered by using a map
- Add only the necessary data to answer your questions to your map and set
- Use the built-in functions for maps and sets to answer your questions
You will be given wt13.cpp. Fill in the required functions, write your own test cases, and then use the built-in functions for maps and sets to answer your questions.
Write the question you’re asking, as a comment, above the function call that gives you the answer to it and then comment what the answer is. Some examples types of questions:
- How many X have Y?
- How many Y does a specific X have?
- Which X has the most Y?
Some hints:
- You can remove the first row of your CSV file if it’s a header row (just make sure to keep track of column titles on your own!)
- If you only care about, say column 1 and 3, you can keep track of which column you’re on via a counter.
- An example of a question that could be answered via a set is: “How many states had an average of at least 8in of rainfall?” or “How many states had no rainfall?”. In that case, as I’m reading my CSV, I’d check the condition (>= 8in of rainfall for the first question, ==0 for the second one) and add the state to the set if the condition is true.
- An example of a question that could be answered via a map is: “How much rain did RI get in 2019?” or “Which of the states in New England got the most amount of rain in February?”
Requirements
- inSet() and inMap() are correctly implemented
- Data is added correctly to the set and map, based on the questions being asked of the data
- Questions being asked are answered via appropriate use of built-in set and map functions
- Write the question you’re asking, as a comment, above the function call that gives you the answer to it and then comment what the answer is.
Handing in
Please get this optional lab checked off at hours or via a private Piazza post.
Grade Breakdown
This assignment covers the non-negotiable topic of sets & maps
and your level of knowledge on them will be assessed as follows:
- To demonstrate an
awareness
of these topics, you must:- Successfully meet requirements 1
- To demonstrate an
understanding
of these topics, you must:- Successfully meet requirements 1 through 2
- To demonstrate
competence
of these topics, you must:- Successfully meet requirements 1 through 3
To receive any credit at all, you must abide by our Collaboration and Academic Honesty Policy. Failure to do so may result in a failing grade in the class and/or further disciplinary action.
Original assignment by Dr. Marco Alvarez, used and modified with permission.