FUNCTIONS FUNDAMENTAL (BIG DATA & BUSINESS INTELLIGENCE)
FUNCTIONS FUNDAMENTAL
1. Functions
So far, we've used several commands: print(), sum(), len(), min(), max(). These commands are more often known as functions. Generally, a function displays this pattern:
It takes in an input.
It does something to that input.
It gives back an output.
For instance, we can observe this pattern if we use the len() function to find the length of a list. Below, the len() function:
Takes list_1 as an input
Does something to the input (list_1) to count its length
Returns the output 5 — the length of list_1
Although steps one and three above are fairly straightforward, the second step is more of a black box — we don't have a clear idea how the len() function manages to count the length of list_1.
However, we could use what we've learned to write code that manages to perform the same task as len() does. This should give us some insight into how the len() function counts the length of a list. In the following code example, we:
Initialize a variable length with a value of 0. Loop through the list list_1 and increment the length variable by 1 for each iteration. Because list_1 has five elements, length will be incremented by 1 five times, ending up with a value of 5, which is equivalent to the length of list_1.
Let's now try to guess how the sum() function might work behind the scenes.
Instructions
Compute the sum of a_list (already defined in the code editor) without using sum().
Initialize a variable named sum_manual with a value of 0.
Loop through a_list, and for each iteration add the current number to sum_manual.
Print sum_manual and sum(a_list) to check whether the values are the same.
Jawaban :
2. Built-in Function
Functions help us work faster and simplify our code. Whenever we have a repetitive task, we can considerably speed up our workflow by using a function each time we do that task.
In the diagram below, for instance, our repetitive task is calculating the sum of a list — we need to perform this task three times since we have three lists (list_1, list_2, and list_3). We can see that using sum() makes our code simpler, cleaner, and helps us work faster by enabling us to write less code:
We've seen that Python has a couple of ready-made functions like sum(), len(), min(), and max(). These functions are already built into Python and are available for immediate use. Because they are already built-in, they are called built-in functions.
Python, however, doesn't have built-in functions for absolutely every task we might want to do. For instance, in the previous mission, we generated many frequency tables for our iOS apps data set, and we might want to use a function to accomplish that repetitive task. The function should:
Take the iOS apps data set in as input
Generate the frequency table for the column we want
Return the frequency table (in the form of a dictionary) as output
Python doesn't have a built-in function to accomplish this. However, Python allows us to write our own functions, which means we can create a function that generates frequency tables.
Starting with the next screen, we'll discuss the details around creating functions. For now, let's remind ourselves the workflow for generating a frequency table by doing the exercises below.
Instructions
Generate a frequency table for the ratings list, which is already initialized in the code editor.
Start by creating an empty dictionary named content_ratings.
Loop through the ratings list. For each iteration:
If the rating is already in content_ratings, then increment the frequency of that rating by 1.
Else, initialize the rating with a value of 1 inside the content_ratings dictionary.
Print content_ratings.
Jawaban :
3- Creating our own Functions
On the previous screen, we mentioned that our goal is to create a function that generates a frequency table. This process, however, is a bit more complex, so we'll start with more simple examples and build up from there toward that goal.
Let's say we want to create a function named square() that takes in a number as input and returns its square as output. To find the square of a number, we need to multiply that number by itself. For instance, to find the square of 6, we need to multiply 6 by itself: 6 × 6, which equals 36 — so the square of 6 is 36.
This is how we can create the square() function:
To create the square() function above, we:
Started with the def statement, where we:
Specified the name of the function (square)
Specified the name of the variable (a_number) that will serve as input
Surrounded the input variable a_number within parentheses
Ended the line of code with a colon (:)
Specified what we want to do with the input a_number (in the code below the def statement)
We multiplied a_number by itself: a_number * a_number
Then we assigned the result of a_number * a_number to a variable named squared_number
Ended with the return statement, where we specified what we want returned as the output.
The output is the variable squared_number, which stores the result of a_number * a_number.
After we define the square() function, we can use it to compute the square of a number. Below, we use the function three times to compute the square of the numbers 6,4, and 9:
To compute the square of 6, we use the code square(a_number=6).
To compute the square of 4, we use the code square(a_number=4).
To compute the square of 9, we use the code square(a_number=9).
a_number is the input variable, and we can see that it can take various values. This enables us to use the square() function for any number we want.
Now let's practice creating functions before resuming the explanations in the next screen.
Instructions
Recreate the square() function above and compute the square for numbers 10 and 16.
Assign the square of 10 to a variable named squared_10.
Assign the square of 16 to a variable named squared_16.
Jawaban :
4- The Structure of Function
On the previous screen, we created and used a function named square() to compute the square of 6,4, and 9. You probably noticed that the value we assigned to a_number changed for every number — a_number=6, a_number=4, and a_number=9:
To compute the square of 6, we used the code square(a_number=6). To compute the square of 4, we used the code square(a_number=4). To compute the square of 9, we used the code square(a_number=9).
To understand what happens when we change the value we assign to a_number, you should try to imagine a_number being replaced with that specific value inside the definition of the function:
Structurally, the function above is composed of a header (which contains the def statement), a body, and a return statement. Together, these three elements make up the function's definition. We'll often use the phrase "inside the function's definition" to refer to the function's body.
Notice we indented the body and the return statement four spaces to the right — recall that we did the same for the bodies of for loops and if statements. Technically, we only need to indent at least one space character to the right, but the convention in the Python community is to use four space characters instead. This helps with readability — other people who follow this convention will be able to read your code easier, and you'll be able to read their code easier.
Now let's practice more by creating a new function.
Instructions
Create a function named add_10() that:
Takes a number as the input (name the input variable as you wish).
Adds the integer 10 to that number.
Returns the result of the addition.
Use the add_10() function to:
Add 10 to the number 30. Assign the result to a variable named add_30.
Add 10 to the number 90. Assign the result to a variable named add_90.
Jawaban :
5- Parameters & Arguments
Based on how we've used functions in the previous missions, it might seem a bit odd to you that we used square(a_number=6) instead of square(6). For instance, if we wanted to use len() to find the length of a list list_1, we'd use len(list_1), not len(some_variable_name=list_1).
However, we can actually use square(6) directly if we wanted to:
The a_number variable in def square(a_number) is the input variable, and whether we type it out or not when we use the function is optional. When we use square(6) instead of square(a_number=6), Python will automatically assign 6 to the a_number variable behind the scenes. This means that square(6) is essentially the same thing as square(a_number=6).
Input variables like a_number are more often known as parameters. So, a_number is a parameter of the square() function. When the parameter a_number takes a value (like 6 in square(a_number=6)), that value is called an argument.
For square(a_number=6), we'd say the a_number parameter took in 6 as an argument. For square(a_number=100), the parameter a_number took in 100 as an argument. For square(a_number=555), the parameter a_number took in 555 as an argument, and so on.
Note that you'll often see people using the terms "parameter" and "argument" interchangeably, but this is not correct.
We'll now focus on the return statement. If you look at the square() function above, you can see that we returned the squared_number variable. However, you can couple the return statement with an entire expression rather than just a single variable.
This means we can directly return the result of the expression a_number * a_number and omit the variable assignment step:
Skipping a variable assignment step or two to simplify our code is generally a good idea, as long as our code doesn't become hard to read. If you end up with something like return ((a_number2 + a_number)a_number * 22.71828) / (a_numbera_number)**5, then that's a sign you skipped too many steps.
Now let's practice what we learned in the mission.
Instructions
Recreate the square() function by omitting the variable assignment step inside the function's body.
Without typing out the name of the parameter, use the new square() function to compute the square of the numbers 6 and 11.
Assign the square of 6 to a variable named squared_6.
Assign the square of 11 to a variable named squared_11.
6- Extract Values from any Column
Now that we've learned more about functions and how to create them, let's get back to our initial goal: creating a function that generates frequency tables for any column we want in our iOS apps data set.
As a reminder, our data set is structured as a list of lists. To generate a frequency table for a certain column, we could:
Extract the values of the column in a separate list.
Generate a frequency table for the elements of that list.
One thing we can try is to create a separate function for each of these two tasks:
A function that extracts the values for any column we want in a separate list; and
A function that generates a frequency table for a list.
Using the first function, we can extract the values for any column we want in a separate list. Then, we can pass the resulting list as an argument to the second function, which will output a frequency table for that list. We'll create the first function in this screen, and the second function in the next screen.
To extract the values from any column we want from our apps_data data set, we need to:
Create an empty list.
Loop through the apps_data data set (excluding the header row), and for each iteration:
Store the value from the column we want in a variable.
Append that value to the empty list we created outside the for loop.
Below, we see how to extract the values for the cont_ratings column:
Now let's create a function that extracts the values from any column we want. We'll once again work with the iOS apps data set in the following exercise. Below, you can see a few of its rows in case you need to recollect its structure:
id track_name size_bytes currency price rating_count_tot rating_count_ver user_rating user_rating_ver ver cont_rating prime_genre sup_devices.num ipadSc_urls.num lang.num vpp_lic
0 284882215 Facebook 389879808 USD 0.0 2974676 212 3.5 3.5 95.0 4+ Social Networking 37 1 29 1
1 389801252 Instagram 113954816 USD 0.0 2161558 1289 4.5 4.0 10.23 12+ Photo & Video 37 0 29 1
2 529479190 Clash of Clans 116476928 USD 0.0 2130805 579 4.5 4.5 9.24.12 9+ Games 38 5 18 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7196 977965019 みんなのお弁当 by クックパッド お弁当をレシピ付きで記録・共有 51174400 USD 0.0 0 0 0.0 0.0 1.4.0 4+ Food & Drink 37 0 1 1
Instructions
Write a function named extract() that can extract any column you want from the apps_data data set.
The function should take in the index number of a column as input (name the parameter as you want).
Inside the function's definition:
Create an empty list.
Loop through the apps_data data set (excluding the header). Extract only the value you want by using the parameter (which is expected to be an index number).
Append that value to the empty list.
Return the list containing the values of the column.
Use the extract() function to extract the values in the prime_genre column. Store them in a variable named genres. The index number of this column is 11.
Jawaban :
7- Creating Frequency Tables
In the previous exercise, we created the extract() function, which we can use to extract the values for any column we want from our apps_data data set. Remember that we want to create two functions:
A function that extracts the values for any column we want in a separate list (we already created this function — it's the extract() function).
A function that generates a frequency table for a list.
In the following exercise, we'll create the second function. Remember that to create a frequency table for the elements of a list, we need to:
Create an empty dictionary.
Loop through that list and check for each iteration whether the iteration variable exists as a key in the dictionary created.
If it exists, then increment by 1 the dictionary value at that key.
Else (if it doesn't exist), create a new key-value pair in the dictionary, where the dictionary key is the iteration variable, and the dictionary value is 1.
Below, you can see a few rows of the apps_data data set in case you need to recollect its structure:
id track_name size_bytes currency price rating_count_tot rating_count_ver user_rating user_rating_ver ver cont_rating prime_genre sup_devices.num ipadSc_urls.num lang.num vpp_lic
0 284882215 Facebook 389879808 USD 0.0 2974676 212 3.5 3.5 95.0 4+ Social Networking 37 1 29 1
1 389801252 Instagram 113954816 USD 0.0 2161558 1289 4.5 4.0 10.23 12+ Photo & Video 37 0 29 1
2 529479190 Clash of Clans 116476928 USD 0.0 2130805 579 4.5 4.5 9.24.12 9+ Games 38 5 18 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7196 977965019 みんなのお弁当 by クックパッド お弁当をレシピ付きで記録・共有 51174400 USD 0.0 0 0 0.0 0.0 1.4.0 4+ Food & Drink 37 0 1 1
Instructions
Write a function named freq_table() that generates a frequency table for any list.
The function should take in a list as input.
Inside the function's body, write code that generates a frequency table for that list and stores the table in a dictionary.
Return the frequency table as a dictionary.
Use the freq_table() function on the genres list (already defined from the previous screen) to generate the frequency table for the prime_genre column. Store the frequency table to a variable named genres_ft.
Feel free to experiment with the extract() and freq_table() functions to easily create frequency tables for any column you want.
Jawaban :
8- Writing a Single Function
In the last exercise, we used two functions to create our frequency table:
The extract() function, which extracts the values of any column we want in a separate list.
The freq_table() function, which generates a frequency table for a list.
So the entire process is composed of two big steps:
Extract the column we want as a list.
Generate a frequency table for that list.
If you recall from the last mission, however, we learned to generate frequency tables without having to extract the columns as a separate step. For instance, this is how we generated the frequency table for the content ratings column:
This means we can write a single function to generate the frequency tables for any column we want. Let's try that in the following exercise.
Instructions
Write a function named freq_table() that generates a frequency table for any column in our iOS apps data set.
The function should take the index number of a column in as an input (name the parameter as you want).
Inside the function's body:
Loop through the apps_data data set (don't include the header row) and extract the value you want by using the parameter (which is expected to be an index number).
Build the frequency table as a dictionary.
The function should return the frequency table as a dictionary.
Use the freq_table() function to generate a frequency table for the user_rating column (the index number of this column is 7).
Store the table in a variable named ratings_ft.
Jawaban :
9- Reusability & Multiple Parameters
Recall from the previous exercise that we iterated directly over the apps_data data set inside the function's definition:
Iterating over apps_data inside the function's definition constrains us to use freq_table() only for the apps_data data set. If we had another data set or wanted to save the function to use it later for another project, freq_table() would be useless because it's designed to iterate only over the apps_data variable.
One of the key aspects that make functions great is reusability. We generally use functions to speed up repetitive tasks, so a function is no good if it's not reusable for the various instances of a certain kind of task.
Our freq_table() function is currently reusable only with respect to the columns we may want to generate the frequency tables for. We need to enlarge its degree of reusability by making it reusable with respect to both columns and data sets.
To do that, we'd need to change our function to not only take an index value as the input, but also a data set. So instead of having freq_table(index), we'd need something like freq_table(index, data_set) — that is, we'd need to create the function with two parameters.
Fortunately, Python allows us to use multiple parameters for the functions we create. For instance, consider the add() function below, which has two parameters, a and b, and returns their sum:
Let's add one more parameter to our freq_table() function and increase its degree of reusability.
Instructions
Update the current freq_table() function to make it more reusable.
The function should take in two inputs this time: a data set and the index of a column (name the parameters as you want).
Inside the function's body:
Loop through the data set using that parameter which is expected to be a data set (a list of lists). For each iteration, select the value you want by using the parameter which is expected to be an index number.
Build the frequency table as a dictionary.
The function should return the frequency table as a dictionary.
Use the updated freq_table() function to generate a frequency table for the user_rating column (the index number of this column is 7). Store the table in a variable named ratings_ft.
Jawaban :
10- Keyword & Positional Arguments
When a function has multiple parameters, there's more than one way to pass in arguments. Consider, for instance, a function named subtract(a, b), which takes a and b as inputs, and returns the result of the subtraction a - b.
Let's say we want to perform the subtraction 10 - 7. This means that we'll need to pass 10 and 7 in as arguments to the subtract() function. There's more than one way to pass in these two arguments and get the result we want (which is 3 — the result of 10 - 7).
When we use the syntax subtract(a=10, b=7) or subtract(b=7, a=10), we pass in the arguments 10 and 7 using the variable names a and b. For this reason, they are called named arguments, or more commonly, keyword arguments.
When we use keyword arguments, the order we use to pass in the arguments doesn't make any difference. In our example, Python knows exactly that the argument 10 corresponds to the parameter a, and the argument 7 to the parameter b, regardless of whether we use subtract(a=10, b=7) or subtract(b=7, a=10).
This is because when we specify a=10 and b=7, we're crystal clear about what arguments correspond to what parameters, and the order we use to make these specifications doesn't matter anymore.
However, when we use subtract(10, 7) or subtract(7, 10), we're not explicit about what arguments correspond to what parameters. To solve this ambiguity, Python maps arguments with parameters by position; the first argument will be mapped to the first parameter, and the second argument will be mapped to the second parameter.
Arguments that are passed by position are called positional arguments. In the diagram above, we can see the order we use to pass in positional arguments makes a clear difference in terms of argument-parameter mapping, leading to different results in each example (in the example on the left the result will be 3, while on the right the result will be -3).
Positional arguments are often preferred because they involve less typing, and they can speed up our workflow. However, we need to pay extra attention to the order we use to avoid incorrect mappings that can lead to logical errors.
Now let's get a bit of practice with keyword and positional arguments.
Instructions
Use the freq_table() function to generate frequency tables for the cont_rating, user_rating, and prime_genre columns.
Use positional arguments when you generate the table for the cont_rating column (index number 10). Assign the table to a variable named content_ratings_ft.
Use keyword arguments for the user_rating column (index number 7) following the order (data_set, index). Assign the table to a variable named ratings_ft.
Use keyword arguments for the prime_genre column (index number 11) following the order (index, data_set). Assign the table to a variable named genres_ft.
Jawaban :
Hasil User Ratings Frequency Table :
Hasil Prime Genre Frequency Table :
11- Combining Functions
When we write a function, it's common to use other functions inside the body of the function we're creating.
Let's say we want to write a function named mean(), that takes in a list of numbers and returns the mean of that list. To get the mean of a list, we first need to compute its sum, and then divide the result by the length of that list. Let's start with writing two functions, one for computing the sum of a list, and the other for computing the length.
Now we can use the find_length() and find_sum() function to create the mean() function:
Notice that we used the find_sum() and the find_length() functions inside the body of the mean() function. We can see that a_list_of_numbers, which is the parameter of the mean() function, becomes an argument for the find_sum() and find_length() function. This aspect should become more clear when we're using keyword arguments:
Being able to reuse functions inside other functions is important because it saves us from having to constantly write the same pieces of code. For instance, if we weren't able to reuse the find_sum() and find_length() functions inside the mean() function, we'd have to again write the code we need to find the length and the sum:
Reusing functions inside other functions enables us to elegantly build complex functions by abstracting away function definitions:
Now let's get a bit of practice with reusing functions inside another function's definition. In the code editor on the right, you can see that we've already defined three functions:
extract(): Extracts any column we want from a data set (notice that it has two parameters: data_set and index find_sum(): Computes the sum of the elements of a list find_length(): Computes the length of a list
Instructions
Write a function named mean() that computes the mean for any column we want from a data set.
The function should take in two inputs: a data set and an index value.
Inside the body of the mean() function, use the extract() function to extract the values of a column into a separate list, and then compute the mean of the values in that list using find_sum() and find_length().
The function should return the mean of the column.
Use the mean() function to compute the mean of the price column (index number 4). Assign the result to a variable named avg_price.
Jawaban :
Hasil :
12- Debugging Functions
When we write more complex functions, it's quite common to run into errors. As the functions we write get more complex, it becomes a bit more difficult to understand where errors come from and what we actually need to fix in our code.
Consider the mean() function we wrote in the previous exercise. When we call (use) the mean() function, mean() calls in turn three other functions:
extract()
find_sum()
find_length()
There's a lot of code running in the background when we run a single line of code like mean(apps_data, 4). Running mean(apps_data, 4) will first call the mean() function, which in turn will need to call the extract(), find_sum(), and find_length() functions in order to work properly.
Suppose we ran mean(apps_data, 4) in the code editor and got this error message:
On the last line of the error message, we can see a general description of the error: TypeError: unsupported operand type(s) for +=: 'int' and 'str'. On the left side of the image, we can see three ---> symbols, which are meant to represent arrows pointing toward certain lines of code. These arrows point toward the lines of code that lead to the error.
The last arrow (from top to bottom) shows us where the error really happened. Notice that it points toward the a_sum += element line, which is part of the find_sum() function.
The error description is TypeError: unsupported operand type(s) for +=: 'int' and 'str', so we can deduce that a_sum += element raises an error because we're trying to add an integer (int) with a string (str). We defined a_sum = 0, so element must be the variable storing a string. To fix the error, we'll need to convert element to an integer or a float.
Notice that although a_sum += element was the line of code causing the problem, the arrows also point toward mean(apps_data, 4) and return find_sum(column) / find_length(column). That's because the error is traced back from the initial function call to the latest function call:
We initially called mean(apps_data, 4), which in turn called the mean() function. When the mean() function was called there were problems with running return find_sum(column) / find_length(column). These problems were traced backed to a_sum += element which is part of the find_sum() function (the most recent function that was called before the error was raised). Because the error is traced back from the initial call to the latest call, the error message is called traceback message, or simply traceback.
In programming, errors are known as bugs. The process of fixing an error is commonly known as debugging.
Instructions
The code we provided in the code editor has several bugs (errors). Run the code, and then use the information provided in the tracebacks to debug the code.
Select all lines of code and press ctrl + / (PC) or ⌘ + / (Mac) to uncomment it so you can modify it.
For a demo of how this keyboard shortcut works, see this help article.
To understand what the bug is, read the general description of the error.
To understand where the bug is, follow the red arrows.
You'll see the arrows are represented as ---> or ^.
Note that there's more than one bug in the code we wrote. Once you debug an error, you'll get another error. This doesn't mean you're not making progress, on the contrary — you're closer to debugging the code completely.
Jawaban :
Komentar
Posting Komentar