On this page:
✔️ Pre-Req Info
✔️ Motivation
✔️ Background Info
✔️ Your Task
✔️ Requirements
✔️ Handing in
✔️ Grade Breakdown
Pre-Req Info
Before jumping into this assignment, below are the topics you’re expected to be familiar with. If you need some review, check out our resource/prep page.
✔️ Command line arguments (tutorial, video)
✔️ Reading and writing to files (tutorial, reference page)
✔️ Classes (OOP material)
Motivation (Why are we doing this?)
The goal of this deep dive is to provide a review of basic C++ tasks and OOP principles, such as:
✔️ processing command line arguments
✔️ reading/writing files
✔️ manipulating data with arrays and matrices
✔️ using an object oriented design
Background Info
For this deep dive, you’ll be writing a program that can be used as a command line tool to perform image binarization. Your program will even give the user some options to customize their image binarization!
Slow down, what’s a command line? A command line interface (aka CLI) is a device for text input and output from programs. Portrayed in media as a black screen with green text that hackers use. It’s not as daunting as it seems and you should get familiar with it, because it simplifies a lot of tasks. To learn more, check out: MIT’s Missing Semester Shell Unit, Intro to the Terminal (for Macs), or Intro to PowerShell (for Windows).
What is image binarization?
In computer vision, image binarization, a.k.a. thresholding is the process of taking a grayscale image and converting it into a black and white image. In grayscale images, every pixel represents an intensity value ranging from 0 (black) to 255 (white). In black and white images, every pixel is either 0 or 255. Intensity refers to the brightness of a color, white is the brightest and therefore the most intense, black is the darkest and the least intense. The figure below shows an example of image binarization:
Thresholding IRL
A popular application of (color) thresholding is recreating the effects from Andy Warhol’s Pop Art or Obama’s 2008 Hope poster by Shepard Fairey. While these posters were done by hand, using color thresholding, we can create an algorithm that does the same. Nowadays, there are many online converters that will take an image and convert it to the likes of these iconic pieces. For simplicity’s sake, we’ll be doing the more simple black and white thresholding but you’re welcome to explore color thresholding on your own!
Global Thresholding
It’s important to note that pixel values for an image are stored in a matrix. Matrices don’t have to be perfect squares (i.e., 2x2, 3x3, 10x10), and while all most of our examples below are perfect squares, your program should work on images of any dimensions (perfect squares or not).
Slow down, what’s a matrix? You can visualize a matrix as a grid. 2D matrices (aka 2D arrays) are composed of some number of arrays inside another array. To access the element stored at row 2 column 3, you’d say
myMatrix[2][3]
. Want more info on matrices before continuing? Check out this Multidimensional Arrays Tutorial.
The process of image binarization can be done by comparing each pixel within the input image against a predefined threshold T
, and deciding whether each individual pixel gets turned white or black. Pixels whose intensity is less than T
become black, all others become white.
The algorithm below shows how to binarize an image.
input: Image A (grayscale)
output: Image B (black and white)
calculate global T
for each pixel A[i][j] in A do
if A[i][j] < T then
B[i][j] = 0
else
B[i][j] = 255
endif
endfor
The value of T
can be automatically calculated by using a function. For example, T
can be either 1) the average intensity or 2) the median of all pixels. There are hundreds of different ways for calculating T
, often involving statistical measures.
Local Thresholding
While global thresholding uses a constant threshold T
for transforming each pixel in an image, the idea behind local thresholding is to transform pixels by only considering the surrounding area, i.e., the local neighborhood of each pixel. The neighborhood of a pixel p
is a small matrix of dimensions d x d
centered at p
. In this case, for each pixel in the image, we are basically calculating a different T
. The algorithm below shows how to binarize an image using local thresholding.
input: Image A (grayscale)
output: Image B (black and white)
for each pixel A[i][j] in A do
calculate T[i][j] using local neighborhood
if A[i][j] < T[i][j] then
B[i][j] = 0
else
B[i][j] = 255
endif
endfor
As an illustration, the figure below shows an example of applying local thresholding. In this specific example, the median
is what’s as the metric for deciding the new value of a pixel:
Note that when calculating the neighborhood of pixels at the edges, where the neighborhood is not a perfect fit, we ignore all pixels that fall outside the boundaries of the image. In the illustration above, this is the case of the pixel at position
[0,0]
highlighted in orange.
Thresholding gone wrong
If you’re paying attention and understanding the concept of thresholding so far, you’ll recognize that the choice of T
has strong implications on the quality of the final image. Not all methods of tresholding will work well with all images and understanding what methods work best for your image (or set of images) is crucial.
Below is an example of thresholding gone wrong. Notice how in the last image, two of the objects and part of the third completely disappear:
The implications of these types of miscalculations in computer science are far reaching and can lead to issues such as:
- Infra-red light from a soap dispense not recognizing dark-skinned hands (read more).
- Facial recognition software not recognizing dark-skinned faces (quick video).
- Self-driving cars being more likely to hit folks with dark skin (read more).
The code that you write as computer scientists, (software) engineers, and programmers has real-life consequences that can unintentionally harm people. Keep this in mind as you code because code isn’t objective or siloed.
Your Task
Your goal in this assignment is to develop a command line tool that performs image binarization, given some options (described below) provided by the user. Your program will do both global and local thresholding, depending on the user’s arguments.
- For global thresholding, your program should use the
median
of all pixels for the value ofT
. - For local thresholding, your program should use
adib
for value ofT[i,j]
, as calculated by the following formula:T[i,j] = m(i,j) * (1 + k * ((s(i,j)/r)-1))
where:m
is the mean of the local neighborhood centered at pixel (i,j)s
is the standard deviation of the local neighborhood centered at pixel (i,j)k
is 0.2R
is0.5 * (pmax-pmin)
pmax
is the maximum pixel value in the entire imagepmin
is the minimum pixel value in the entire image
Below is an image of the formula with better formatting. This formula is from the paper Adaptive document image binarization by Sauvola and PietikaKinen, 2000.
The options for the user will be provided via the following command line arguments:
<type> either 'local' or 'global'
<in_fname> name of the input file
<out_fname> name of the output file
[<size>] size of the neighborhood
The last argument is optional, and must be provided only when <type>
is 'local'
. For example, see below for a few examples of how to use your tool. Note that the correct order of command line arguments is very important.
$ ./prog global cover.img cover_glo.img
$ ./prog local cover.img cover_loc_5.img 5
$ ./prog local cover.img cover_loc_7.img 7
$ ./prog local cover.img cover_loc_15.img 15
If you haven’t created C++ programs with command line arguments or need a refresher on how to do so, please read this tutorial and/or watch this video.
The images users will give you and that you will generate will be in the img
format, using the .img
extension and formatted as follows: Each image is encoded as a matrix of pixel values where each pixel value is a grayscale intensity, an integer ranging from 0 to 255.
When loading or saving images, each img
file must be a text file where pixels values are separated by a single whitespace, and organized in n
rows and m
columns (the image dimensions). For example, here is one file with 10 rows and 8 columns:
121 24 149 1 173 251 10 38
97 137 153 92 40 93 9 149
138 136 128 18 66 109 16 138
185 218 67 3 194 155 186 255
131 50 188 128 120 193 104 144
39 109 228 155 131 42 133 93
75 148 197 137 26 198 0 226
43 85 167 158 28 207 17 165
14 150 49 205 79 86 216 8
88 78 159 41 66 227 84 80
Note that every pixel value is separated by a single whitespace. There should no trailing whitespaces.
Implementation Note: Within your program, you can represent an image either as a bidimensional array, or as an unidimensional array and design your algorithms accordingly.
We prepared a few conversion scripts that can help you test your program with real-world examples, please refer to this repl.it and feel free to fork it or download it to test your own images.
Requirements
Your submission will be tested and graded by an autograder, for this reason it cannot be stressed enough that your program must exactly follow the assignment specifications:
- Use
double
as the default datatype for the pixels and the terms in all formulas - Use
main.cpp
as the single filename for your program. - You should receive no warning or error messages upon compilation with the exact following command
$ g++ -std=c++11 -Wall main.cpp -o prog
- Your command line program should use the following arguments, in this exact order:
<type> either 'local' or 'global' <in_fname> name of the input file <out_fname> name of the output file [<size>] size of the neighborhood
- Use functions and classes to abstract your program, per OOP guidelines, but have everything in the same file. Name that file
main.cpp
- For global thresholding, use the
median
of all pixels for the value ofT
. - For local thresholding, use
adib
formula for value ofT[i,j]
, as defined in the previous section.
BEFORE HANDING IN: Test that your program works by compiling your program with the command in Requirement #3. Successful execution of this command should create an exectuable file named prog
, which you should be able to execute using the arguments as outlined in Requirement #4 (example uses below).
$ ./prog global cover.img cover_glo.img
$ ./prog local cover.img cover_loc_5.img 5
$ ./prog local cover.img cover_loc_7.img 7
$ ./prog local cover.img cover_loc_15.img 15
Once you’ve read through the assignment, I encourage you to check out the breakdown video associated with it.
Handing in
To submit your solution to Gradescope, simply select main.cpp
and use the drag and drop option.
Grade Breakdown
This assignment covers the pre-requisite topics of C++ and OOP basics and your level of knowledge on them will be assessed as follows:
- To demonstrate an
awareness
of these topics, you must:- Successfully meet requirements 1 thorugh 3
- To demonstrate an
understanding
of these topics, you must:- Successfully meet requirements 1 through 6
- To demonstrate
competence
of these topics, you must:- Successfully meet requirements 1 through 7
To receive any credit at all, you must abide by our Collaboration and Academic Honesty Policy. Failure to do so may result in a failing grade in the class and/or further disciplinary action.
Original assignment by Dr. Marco Alvarez, used and modified with permission.