= open('uci_mldb_forestfires.csv', 'r').readlines() data_file
Homework
Name:
Instructions
In your private course repo on Rivanna, write a Jupyter notebook running Python that performs the numbered tasks below.
For each task, create one or more code cells to perform the task.
Save your notebook in the M04
directory as hw04.ipynb
.
Add and commit these files to your repo.
Then push your commits to your repo on GitHib.
Be sure to fill out the Student Info block above.
To submit your homework, save the notebook as a PDF and upload it to GradeScope, following the instructions.
TOTAL POINTS: 14
Overview
In this homework, you will work with the Forest Fires Data Set from UCI.
There is a local copy of these data as a CSV file in the HW
directory for this module in the course repo.
You will create a group of related functions to process these data.
This notebook will set the table for you by importing and structuring the data first.
Setting Up
First, we read in our local copy of the dataset and save it as a list of lines.
Then, we inspect first ten lines, replacing commas with tabs for readability.
for row in data_file[:10]:
= row.replace(',', '\t')
row print(row, end='')
X Y month day FFMC DMC DC ISI temp RH wind rain area
7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0
7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0
7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0
8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0
8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0
8 6 aug sun 92.3 85.3 488.0 14.7 22.2 29 5.4 0.0 0.0
8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0.0 0.0
8 6 aug mon 91.5 145.4 608.2 10.7 8.0 86 2.2 0.0 0.0
8 6 sep tue 91.0 129.5 692.6 7.0 13.1 63 5.4 0.0 0.0
Convert CSV into Datafame-like Data Structure
We use a helper function to convert the data into the form of a dataframe-like dictionary.
That is, we convert a list of rows into a dictionary of columns, each cast to the appropriate data type.
Later, we will use Pandas and R dataframes to do this work.
First, we define the data types by inspecting the data and creating a dictionary of lambda functions to do our casting.
= ['i', 'i', 's', 's', 'f', 'f', 'f', 'f', 'f', 'i', 'f', 'f', 'f']
dtypes # dtypes = list("iissfffffifff") # We could have done it this way, too
= {
caster 'i': lambda x: int(x),
's': lambda x: str(x),
'f': lambda x: float(x)
}
Next, we grab the column names from the first row or list.
Note that .strip()
is a string function that removes extra whitespace from before and after a string.
= data_file[0].strip().split(',') cols
Finally, we iterate through the list of rows and flip them into a dictionary of columns.
The key of each dictionary element is the columns name, and the value is a list of values with a common data type.
# Get the rows, but not the first, and convert them into lists
= [line.strip().split(',') for line in data_file[1:]]
rows
# Initialize the dataframe by defining a dictionary of lists, with each column name as a key
= {col:[] for col in cols}
firedata
# Iterate through the rows and convert them to columns
for row in rows:
for j, col in enumerate(row):
firedata[cols[j]].append(caster[dtypes[j]](col))
Test to see if it worked …
'Y'][:5] firedata[
[5, 4, 4, 6, 6]
Working with spatial coordinates X
, Y
For the first tasks, we grab the first two columns of our table, which define the spatial coordinates within the Monteshino park map.
= firedata['X'], firedata['Y'] X, Y
10], Y[:10] X[:
([7, 7, 7, 8, 8, 8, 8, 8, 8, 7], [5, 4, 4, 6, 6, 6, 6, 6, 6, 5])
Task 1
(2 points)
Write a function called coord_builder()
with these requirements:
- Takes two lists, X and Y, as inputs. X and Y must be of equal length.
- Returns a list of tuples
[(x1,y1), (x2,y2), ..., (xn,yn)]
where(xi,yi)
are the ordered pairs from X and Y. - Uses the
zip()
function to create the returned list. - Use a list comprehension to actually build the returned list.
- Contains a docstring with short description of the function.
# CODE HERE
Task 2
(1 PT)
Call your coord_builder()
function, passing in X
and Y
.
Then print the first ten tuples.
# CODE HERE
Working with AREA
Next, we work the area column of our data.
= firedata['area'] area
-10:] area[
[0.0, 0.0, 2.17, 0.43, 0.0, 6.44, 54.29, 11.16, 0.0, 0.0]
Task 3
(1 PT)
Write code to print the minimum area and maximum area in a tuple (min_value, max_value)
.
Save min_value
and max_value
as floats.
# CODE HERE
Task 4
(2 PTS)
Write a lambda function that applies the following function to \(x\):
\(log_{10}(1 + x)\)
Return the rounded value to \(2\) decimals.
Assign the function to the variable mylog10
.
Then call the lambda function on area
and print the last 10 values.
Hints: * Use the log10
function from Python’s math
module. You’ll need to import it. * Use a list comprehension to make the lambda function a one-liner. * To get the last members of a list, used negative offset slicing. See the Python documentation on lists for a refresher on slicing.
# CODE HERE
Working with MONTH
The month column contains months of the year in abbreviated form — jan
to dec
.
= firedata['month'] month
10] month[:
['mar', 'oct', 'oct', 'mar', 'mar', 'aug', 'aug', 'aug', 'sep', 'sep']
Task 5
(1 PT)
Create a function called get_uniques()
that extracts the unique values from a list. * Do not use set()
but instead use a dictionary comprehension to capture the unique names. * Hint: They keys in a dictionary are unique. * Hint: You do not need to count how many times a name appears in the source list.
Then function should optionally return the list as sorted in ascending order.
Then apply it to the month
column of our data with sorting turned on.
Then print the unique months.
# CODE HERE
Task 6
(1 PT)
Write a lambda function called get_month_for_letter
that uses a list comprehension to select all months starting with a given letter from the list of unique month names you just crreated.
The function should assume that the list of unique month names exists in the global context.
The returned list should contain uppercase strings.
Run and print the result with a
as the paramter.
# CODE HERE
Working with DMC
DMC - DMC index from the FWI system: 1.1 to 291.3
= firedata['DMC'] dmc
10] dmc[:
[26.2, 35.4, 43.7, 33.3, 51.3, 85.3, 88.9, 145.4, 129.5, 88.0]
Task 7
(2 PTS)
Write a function called bandpass_filter()
with these requirements:
- Takes three inputs:
- A list of numbers
num_list
. - An integer serving as a lower bound
lower_bound
. - An integer serving as an upper bound
upper_bound
.
- A list of numbers
- Returns a new array containing only the values from the original array which are greater than
lower_bound
and less thanupper_bound
.
# CODE HERE
Task 8
(1 PT)
Call bandpass_filter()
passing dmc
as the list, with lower_bound=25
and upper_bound=35
.
Then print the result.
# CODE HERE
Working with FFMC
FFMC - FFMC index from the FWI system: 18.7 to 96.20
= firedata['FFMC'] ffmc
10] ffmc[:
[86.2, 90.6, 90.6, 91.7, 89.3, 92.3, 92.3, 91.5, 91.0, 92.5]
Task 9
(2 PTS)
Write a lambda function get_mean
that computes the mean \(\mu\) of a list of numbers. * The mean is jus the sum of a list of numeric values divided by the length of that list.
Write another lambda function get_ssd
that computes the squared deviation of a number. * The function takes two arguments, a number from a given list and the mean of the numbers in that list. * The function is meant to be used in a for-loop that iterates through a list. * The squared deviation of a list element \(x_i\) is \((x_i - \mu)^2\).
Then write get_sum_sq_err()
with these requirements: * Takes a numeric list as input. * Computes the mean \(\mu\) of the list using get_mean
. * Computes the sum of squared deviations for the list using a list comprehension that applies get_ssd
. * Returns the sum of squared deviations.
# CODE HERE
Task 10
(1 PT)
Call sum_sq_err()
passing ffmc
as the list and print the result.
# CODE HERE