DICTIONARIES AND FREQUENCY TABLES (BIG DATA & BUSINESS INTELLIGENCE)
DICTIONARIES AND FREQUENCY TABLES
1. STORING DATA
In the last mission, we worked with a data set that stores information for 7,197 mobile apps:
id track_name size_bytes currency price rating_count_tot rating_count_ver user_rating user_rating_ver ver cont_rating prime_genre sup_devices.num ipadSc_urls.num lang.num vpp_lic 0 284882215 Facebook 389879808 USD 0.0 2974676 212 3.5 3.5 95.0 4+ Social Networking 37 1 29 1 1 389801252 Instagram 113954816 USD 0.0 2161558 1289 4.5 4.0 10.23 12+ Photo & Video 37 0 29 1 2 529479190 Clash of Clans 116476928 USD 0.0 2130805 579 4.5 4.5 9.24.12 9+ Games 38 5 18 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 7196 977965019 みんなのお弁当 by クックパッド お弁当をレシピ付きで記録・共有 51174400 USD 0.0 0 0 0.0 0.0 1.4.0 4+ Food & Drink 37 0 1 1
The cont_rating column offers information about the content rating of each app. The content rating of an app (also known as the maturity rating) represents the age required to use that app. The table below shows the unique content ratings in our data set, along with the number of apps specific to each rating:
Content rating Number of apps 4+ 4,433 9+ 987 12+ 1,155 17+ 622 From the table above, we can see that:
Most apps (4,433 apps) have a content rating of 4+ (only people aged four or older are allowed to use these apps). Apps with a content rating of 17+ are the fewest (622 apps). In the middle, we have the 9+ and 12+ apps — 987 apps have a content rating of 9+, and 1,155 apps have a rating of 12+. If we wanted to save the data from the table above, we could use two lists or maybe a list of lists. We'll try this in the following exercise, while on the next screen we'll learn about dictionaries and explore a more efficient solution for storing the data above.
Instructions
Store the data in the table above using two different lists.
Assign the list ['4+', '9+', '12+', '17+'] to a variable named content_ratings.
Assign the list [4433, 987, 1155, 622] to a variable named numbers.
Store the data in the table above using a list of lists. Assign the list [['4+', '9+', '12+', '17+'], [4433, 987, 1155, 622]] to a variable named content_rating_numbers.
Jawaban & Hasil :
2. DICTIONARIES
In the previous screen, we saw a table that shows the unique content ratings in our data set, along with the number of apps specific to each rating:
Content rating Number of apps 4+ 4,433 9+ 987 12+ 1,155 17+ 622 We stored the data above in two ways:
Using two separate lists Using a single list of lists
Looking at the lists above, it may not be immediately clear which content rating corresponds to which number — especially for someone who doesn't have enough context. We need to find a better way to map a content rating to its corresponding number.
Remember that each list element has an index number. Let's consider the numbers list:
What if we could transform the index numbers to content rating values? This way, the mapping between content ratings and their corresponding numbers should become much clearer.
Fortunately, we can do this using a dictionary:
To create the dictionary above, we:
Mapped each content rating to its corresponding number by following an index:value pattern. For instance, to map a rating of '4+' to the number 4,433, we typed '4+': 4433 (notice the colon between '4+' and 4433). To map '9+' to 987, we typed '9+': 987, and so on. Typed the entire sequence of index:value pairs, and separated each with a comma: '4+': 4433, '9+': 987, '12+': 1155, '17+': 622. Surrounded the sequence with curly braces: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622} The animation below should help you understand better the transition from table to dictionary.
Now, let's try to create the content_ratings dictionary ourselves. In the next screen, we'll learn more interesting things about dictionaries.
Instructions
Map content ratings to their corresponding numbers by recreating the dictionary above: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}. Assign the dictionary to a variable named content_ratings.
Print content_ratings and examine the output carefully. Has the order we used to create the dictionary been preserved? In other words, is the output identical to {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}? We'll discuss more about this on the next screen.
Jawaban & Hasil :
3. INDEXING
Using a dictionary allowed us to change the index numbers of a list to content rating values. That way, the mapping between content ratings and their corresponding numbers became much clearer.
To retrieve the individual values of the content_ratings dictionary, we can use the new indices. The way we retrieve individual dictionary values is identical to the way we retrieve individual list elements — we follow a variable_name[index] pattern:
In the previous exercise, you may have noticed that when we create a dictionary, the order we use to arrange the dictionary elements is not necessarily preserved. Note below that we create the dictionary {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}, but the order is not exactly preserved, as we can see from the output {'4+': 4433, '9+': 987, '17+': 622, '12+': 1155}.
This is contrary to what we've seen with lists, where the order is always preserved. In lists, there's a direct connection between the index of a value and the position of that value in the list. For instance, the index value 0 always retrieves the list element that's positioned first in a list. If order wasn't preserved and list elements were constantly swapped, then the index value 0 would retrieve different list elements at different times — this is something we strongly want to avoid.
With dictionaries, there's no longer a connection between the index of a value and the position of that value in the dictionary, so the order becomes unimportant. For instance, the index value '4+' will retrieve the value 4433 no matter its position. 4433 could be the first element in the dictionary, the second, the fourth — it doesn't matter.
Whether or not order is preserved within dictionaries also depends on the version of Python we use — we'll discuss versions later on in this course. Now, let's practice retrieving a few dictionary values.
Instructions
Retrieve values from the content_ratings dictionary.
Assign the value at index '9+' to a variable named over_9.
Assign the value at index '17+' to a variable named over_17.
Print over_9 and over_17.
Jawaban & Hasil :
4. ALTERNATIVE WAY OF CREATING A DICTIONARY
Previously, we learned that in order to create a dictionary, we need to:
Map each index to its corresponding value by following an index:value pattern (e.g. '4+': 4433). Type the entire sequence of index:value pairs, and separate each pair with a comma (e.g. '4+': 4433, '9+': 987, '12+': 1155, '17+': 622). Surround the sequence with curly braces (e.g. {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}).
Alternatively, we can create a dictionary and populate it with values by following these steps:
We create an empty dictionary. We add values one by one to that empty dictionary. Adding a value to a dictionary follows the pattern dictionary_name[index] = value. To add a value 4433 with an index '4+' to a dictionary named content_ratings, we need to use the code content_ratings['4+'] = 4433.
We can keep adding values using the same approach:
At a high level, this approach is identical to populating an empty list by using the list_name.append() command. The syntax is different, but fundamentally we take the same steps:
We create an empty dictionary (or list). We add values using the dictionary_name[index] = value technique (or the list_name.append() command in case of a list). In the next exercise, we'll focus on practicing this new technique.
Instructions
Use the new technique we learned to map content ratings to their corresponding numbers inside a dictionary.
Create an empty dictionary named content_ratings.
Add the index:value pairs one by one using the dictionary_name[index] = value technique. This should be the final form of the dictionary: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}.
Retrieve the value at index 12+ from the content_ratings dictionary. Assign it to a variable named over_12_n_apps.
Jawaban & Hasil :
5. KEY VALUE PAIR
The index of a dictionary value is called a key. In '4+': 4433, the dictionary key is '4+', and the dictionary value is 4433. As a whole, '4+': 4433 is a key-value pair.
Dictionary values can be of any data type: strings, integers, floats, Booleans, lists, and even dictionaries.
Dictionary keys can be of almost any data type we've learned so far, except lists and dictionaries. If we use lists or dictionaries as dictionary keys, the computer raises an error:
(In the spirit of explaining what happens behind the curtains, we're going to explain below why this error is raised. Understanding this, however, is not important for moving forward in this mission, so feel free to jump straight to the exercises.)
To understand the error messages above, we have to take a brief look at what Python does behind the scenes. When we populate a dictionary, Python tries to convert each dictionary key to an integer (even if the key is of a data type other than an integer) in the background. Python does the conversion using the hash() command:
For reasons we'll be able to understand later, the hash() command doesn't transform lists and dictionaries to integers, and returns an error instead. Notice the error messages are identical to when we tried to use lists or dictionaries as keys.
When we populate a dictionary, we also need to make sure each key in that dictionary is unique. If we use an identical key for two or more different values, Python keeps only the last key-value pair in the dictionary and removes the others — this means that we'll lose data. We illustrate this in the diagram below, where we highlighted the identical keys with a distinct color:
An odd "gotcha" is when we mix integers with Booleans as dictionary keys. The hash() command converts the Boolean True to 1, and the Boolean False to 0. This means the Booleans True and False will conflict with the integers 0 and 1. The dictionary keys won't be unique anymore, and Python will only keep the last key-value pair in cases like that.
Instructions
Create the following dictionary and assign it to a variable named d_1:
{'key_1': 'first_value',
'key_2': 2,
'key_3': 3.14,
'key_4': True,
'key_5': [4,2,1],
'key_6': {'inner_key' : 6}
}
2.Examine the code below and determine whether it'll raise an error or not. If you think it'll raise an error, then assign the boolean True to a variable named error, otherwise assign False.
{4: 'four',
1.5: 'one point five',
'string_key': 'string_value',
True: 'True',
[1,2,3]: 'a list',
{10: 'ten'}: 'a dictionary'}
Jawaban & Hasil :
6. CHECKING FOR MEMBERSHIP
Previously, we worked with a small table showing the four unique content ratings in our data set, along with the number of apps corresponding to each rating.
Content rating Number of apps
4+ 4,433
9+ 987
12+ 1,155
17+ 622
You might have wondered how we managed to count the number of apps for each unique content rating. How did we find out there are 4,433 apps with a 4+ content rating, or 622 apps with a 17+ rating? Part of the answer is that we used a technique that makes use of the special properties of dictionaries. The full answer is a bit lengthier: we'll learn how to count the number of apps for each unique content rating on these next two screens.
Once we've created a dictionary, we can check whether a certain value exists in the dictionary as a key. We can check, for instance, whether the value '12+' exists as a key in the dictionary {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}. To do that, we use the in operator.
The '12+' in content_ratings expression returned the Boolean True. This is because the string '12+' exists in the dictionary content_ratings as a key.
If we use in with a certain value that doesn't exist among a dictionary's keys, False is returned. For instance, checking whether the string '10+' exists in the dictionary content_ratings returns False because there's no dictionary key '10+' in content_ratings.
Checking whether 4433 or 987 exists in content_ratings also returns False because the search is done only over the dictionary's keys (4433 and 987 exist as dictionary values in content_ratings).
An expression of the form a_value in a_dictionary always returns a Boolean value:
True is returned if a_value exists in a_dictionary as a dictionary key. False is returned if a_value doesn't exist in a_dictionary as a dictionary key. Now let's practice using the in operator.
Instructions
Using the in operator, check whether the following values exist as dictionary keys in the content_ratings dictionary:
The string '9+'. Assign the output of the expression to a variable named is_in_dictionary_1.
The integer 987. Assign the output of the expression to a variable named is_in_dictionary_2.
Combine the output of an expression containing in with an if statement. If the string '17+' exists as dictionary key in content_ratings, then:
Assign the string "It exists" to a variable named result.
Print the result variable.
Jawaban & Hasil :
7. COUNTING WITH DICTIONARIES
Once we've created and populated a dictionary, we can update (change) the dictionary values. To update a dictionary value, we need to reference it by its corresponding dictionary key and then perform the updating operation we want. In the code example below, we:
Change the value corresponding to the dictionary key '4+' from 4433 to 0. Add 13 to the value corresponding to the dictionary key '9+'. Subtract 1155 from the value corresponding to the dictionary key '12+'. Change the value corresponding to the dictionary key '17+' from 622 (integer) to '622' (string).
We can combine updating dictionary values with what we already know to count how many times each unique content rating occurs in our data set. Let's start by considering the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+'], which stores a few content ratings. To use code for counting how many times each rating occurs in this short list, we could:
Create a dictionary where the keys are the unique content ratings and the values are all 0: {'4+': 0, '9+': 0, '12+': 0, '17+': 0}. Loop through the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+'], and for each iteration: Check whether the iteration variable exists as a key in the previously created dictionary. If it exists, then increment the dictionary value at that key by 1.
To get a better understanding of how this works, we'll print the content_rating dictionary inside the for loop to see how it changes with every iteration:
Now let's read in our AppleStore.csv data set, and use the technique above to count the number of times each unique content rating occurs. We should arrive at the same numbers we've been using in the table:
Content rating Number of apps
4+ 4,433
9+ 987
12+ 1,155
17+ 622
Instructions
Count the number of times each unique content rating occurs in the data set.
Create a dictionary named content_ratings where the keys are the unique content ratings and the values are all 0 (the values of 0 are temporary at this point, and they'll be updated).
Loop through the apps_data list of lists. Make sure you don't include the header row. For each iteration of the loop:
Assign the content rating value to a variable named c_rating. The content rating is at index number 10 in each row.
Check whether c_rating exists as a key in content_ratings. If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in c_rating).
Outside the loop, print content_ratings to check whether the counting worked as expected.
Jawaban & Hasil :
8. FINDING THE UNIQUE VALUES
Previously, we created the dictionary {'4+': 0, '9+': 0, '12+': 0, '17+': 0} before we looped over the data set to count the occurrence of each content rating. Unfortunately, this approach requires us to know beforehand the unique values we want to count.
Let's say we didn't know what the unique content ratings are. This means that we don't have enough information to create the dictionary {'4+': 0, '9+': 0, '12+': 0, '17+': 0}. We need to devise a way to extract this information.
Our data set has 7,197 rows, and it's impractical to go over each row and figure out what the unique content ratings are. As a workaround, we can modify the logic of the code we used in the previous screen to find the unique values automatically.
Let's consider again the count we did for the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+']. To perform the count while finding the unique values automatically, we will:
Create an empty dictionary named content_ratings. Loop through the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+'], and check for every iteration whether the iteration variable (c_rating) exists as a key in content_ratings. If it exists, then increment the dictionary value at that key by 1. Else (if it doesn't exist), create a new key-value pair in the content_ratings dictionary, where the dictionary key is the iteration variable (c_rating) and the dictionary value is 1.
You might wonder why we initialized (created) each dictionary key with a dictionary value of 1 instead of 0. When we encounter a content rating, we need to count it, no matter if it already exists or not as a dictionary key. When a rating that is not yet in the dictionary comes in, we need to both initialize it and count it. We need to initialize it with a value of 1 to mark the fact that this rating has already occurred once. If we initialized the dictionary key with a value of 0, we'd succeed in doing the initializing part, but fail to do the counting part.
To get a better understanding of what we did above, we'll print the content_rating dictionary inside the for loop to see how it changes with every iteration:
Now let's try this technique on our data set.
Instructions
Count the number of times each unique content rating occurs in the data set while finding the unique values automatically. Create an empty dictionary named content_ratings.
Loop through the apps_data list of lists (make sure you don't include the header row). For each iteration of the loop:
Assign the content rating value to a variable named c_rating. The content rating is at index number 10.
Check whether c_rating exists as a key in content_ratings.
If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in c_rating).
Else, create a new key-value pair in the dictionary, where the dictionary key is c_rating and the dictionary value is 1.
Outside the loop, print content_ratings to check whether the counting worked as expected.
Jawaban & Hasil :
9. PROPORTIONS AND PERCENTAGE
The number of times a unique value occurs is also called frequency. For this reason, tables like the one below are called frequency tables.
Content rating Number of apps (frequency)
4+ 4,433
9+ 987
12+ 1,155
17+ 622
4+ occurs 4,433 times, so it has a frequency of 4,433. 12+ has a frequency of 1,155. 9+ has a frequency of 987. 17+ has the lowest frequency: 622.
When we're analyzing frequencies, we might be interested in answering questions about proportions and percentages:
What proportion of apps have a content rating of 4+? What percentage of apps have a content rating of 17+? What percentage of apps can a 15-year-old download? The proportion of apps with a content rating of 4+ quantifies the number of 4+ apps relative to the total number of apps. There are 4,433 apps with a content rating of 4+ and 7,197 apps in total, so the proportion of 4+ apps in our data set is 4,433/7,197.
Rather than using a fraction, it's more common to express the proportion as a decimal between 0 and 1. So we'd say that that 0.62 (the result of 4,433/7,197, rounded to two decimal places) of the apps have a content rating of 4+.
To get percentages, we only need to multiply the proportions by 100. Since the proportion of 4+ is 0.62, the percentage is 62% (the result of 0.62 × 100).
On the next screen, we'll learn how to compute proportions and percentages using dictionaries. For now, let's practice more what we've learned by creating a frequency table for the genre_counting column.
Instructions
Count the number of times each unique genre occurs.
Create an empty dictionary named genre_counting.
Loop through the apps_data list of lists (make sure you don't include the header row). For each iteration of the loop:
Assign the genre to a variable named genre. The genre comes as a string and has the index number 11.
Check whether genre exists as a key in genre_counting.
If it exists, then increment the dictionary value at that key by 1 (the key is equivalent to the value stored in genre).
Else, create a new key-value pair in the dictionary, where the dictionary key is genre and the dictionary value is 1.
Outside the loop, print genre_counting and try to determine what's the most common app genre in our data set.
Jawaban & Hasil :
10. LOOPING OVER DICTIONARIES
To transform frequencies to proportions or percentages, we can update the dictionary values individually by performing the required arithmetical operations. Below, we divide each dictionary value by the total number of apps to get from frequencies to proportions.
Updating each individual dictionary value can get more and more cumbersome as the dictionary length increases. For a dictionary with 20 key-value pairs, we'd have to manually update 20 dictionary values. Fortunately, we can speed up the process using a for loop.
When we iterate over a dictionary with a for loop, the looping is done by default over the dictionary keys:
We can use the dictionary keys to access the dictionary values within the loop:
This allows us to update the dictionary values within the loop. This is how we could transform frequencies to proportions from within the loop:
Let's practice this technique by doing a few exercises. Start by answering the two questions we left unanswered in the previous screen:
What percentage of apps has a content rating of '17+'?
What percentage of apps can a 15-year-old download?
Instructions
Loop over the content_ratings dictionary and transform the frequencies to percentages. For every iteration of the loop:
Transform the dictionary value (the frequency) to a proportion by dividing it by the total number of apps.
Transform the updated dictionary value (the proportion) to a percentage by multiplying it by 100.
Find out the percentage of apps that have a content rating of '17+'. Assign your answer to a variable named percentage_17_plus.
Find out the percentage of apps that can be downloaded by a 15-year-old. Assign your answer to a variable named percentage_15_allowed.
Jawaban & Hasil :
11. KEEPING THE DICTIONARIES SEPARATE
Previously, we transformed frequencies to proportions or percentages by overwriting the initial dictionary values. However, we'll often need to keep the dictionaries separate for later analysis. For instance, we might want to have three separate dictionaries: one storing frequencies, another storing proportions, and another storing percentages.
When we transform frequencies to proportions, we can create a new dictionary instead of overwriting the values in the initial dictionary. To do that, we can create a new empty dictionary and populate it within the loop:
To get a better understanding of how this works, we'll print the iteration variable, the proportion, and the new dictionary for every iteration:
Let's now practice this technique by doing a few exercises.
Instructions
Transform the frequencies inside content_ratings to proportions and percentages while creating separate dictionaries for each.
Assign the dictionary storing proportions to a variable named c_ratings_proportions.
Assign the dictionary storing percentages to a variable named c_ratings_percentages.
Optional challenge: try to solve this exercise using a single for loop (solution to this challenge provided).
Jawaban & Hasil :
12. FREQUENCY TABLES FOR NUMERICAL COLUMNS
Creating frequency tables for certain columns may result in creating lengthy dictionaries because of the large number of unique values. For example, consider the size_bytes column (the third column in the table below), which describes the data size of an app:
id track_name size_bytes currency price rating_count_tot rating_count_ver user_rating user_rating_ver ver cont_rating prime_genre sup_devices.num ipadSc_urls.num lang.num vpp_lic
0 284882215 Facebook 389879808 USD 0.0 2974676 212 3.5 3.5 95.0 4+ Social Networking 37 1 29 1
1 389801252 Instagram 113954816 USD 0.0 2161558 1289 4.5 4.0 10.23 12+ Photo & Video 37 0 29 1
2 529479190 Clash of Clans 116476928 USD 0.0 2130805 579 4.5 4.5 9.24.12 9+ Games 38 5 18 1
Below, we generate a frequency table for the size_bytes column. We see there are 7,107 key-value pairs in the resulting dictionary.
A lengthy frequency table is difficult to analyze. The lengthier the table, the harder it becomes to see any patterns. As a workaround, we can create well-defined intervals and count the frequency for those intervals instead. For instance, we may want to create five intervals for the size_bytes column, and then count the number of apps specific to each interval.
Data size (bytes) Frequency
0 - 10,000,000 (0 - 10 MB) 285
10,000,000 - 50,000,000 (10 - 50 MB) 1,639
50,000,000 - 100,000,000 (50 - 100 MB) 1,778
100,000,000 - 500,000,000 (100 - 500 MB) 2,894
500,000,000+ (500+ MB) 601
Using intervals helps us segment the data into groups, which eases analysis. Looking at the table above, we can easily see that most apps are between 100 and 500 MB, the fewest apps are under 10 MB, etc.
Choosing intervals is not always straightforward. Above, we chose the intervals mostly based on our knowledge of common data sizes for phone apps. But if we lacked this knowledge, we'd have to rely on something else to come up with sensible intervals.
When we're trying to come up with some reasonable intervals, it often helps to know the minimum and the maximum values of a column. This will help us determine where the intervals should start and where they should end.
To find out the minimum and the maximum values of a column, we can use the min() and the max() commands. These two commands will find out the minimum and the maximum values for any list of integers or floats.
To find the minimum and maximum app data size, we need to extract the values in the size_bytes column as floats or integers in a separate list, and then use the min() and the max() commands on that list. We'll do this in the exercise below, and resume this discussion in the next screen.
Instructions
Extract the values in the size_bytes column in a separate list.
Create an empty list named data_sizes.
Loop through apps_data (make sure you don't include the header row) and for every iteration:
Store the data size as a float in a variable named size (the index number for the data size is 2).
Append size to the data_sizes list.
Find out the minimum and the maximum app data size.
Assign the minimum value to a variable named min_size.
Assign the maximum value to a variable named max_size.
Jawaban & Hasil :
13. FILTERING FOR THE INTERVALS
Based on the values we found in the previous exercise, let's say we've settled on these intervals for the size_bytes column (note that the minimum and maximum values only informed our choice — they don't necessarily have to be the starting and ending point of the intervals):
Data size (bytes) 0 - 10,000,000 (0 - 10 MB) 10,000,000 - 50,000,000 (10 - 50 MB) 50,000,000 - 100,000,000 (50 - 100 MB) 100,000,000 - 500,000,000 (100 - 500 MB) 500,000,000+ (500+ MB)
Once we have a clear idea on the intervals we want, we can continue with computing the frequency for each interval.
Data size (bytes) Frequency 0 - 10,000,000 (0 - 10 MB) ?
10,000,000 - 50,000,000 (10 - 50 MB) ?
50,000,000 - 100,000,000 (50 - 100 MB) ?
100,000,000 - 500,000,000 (100 - 500 MB) ?
500,000,000+ (500+ MB) ?
We want to store the frequency table as a dictionary. We begin by creating a dictionary with the intervals as dictionary keys and frequencies as dictionary values (we initialize all frequencies with zero):
Next, we loop through our data set. For each iteration:
We store the data size as a float in a variable named data_size. We check to which interval the data size belongs to by using an if statement followed by a series of elif clauses (remember from the previous mission that we use elif to avoid redundant computations). We increment the frequency in our dictionary by 1 depending on the interval the data size belongs to. Note that we use code like 10000000 < data_size <= 50000000 — this is equivalent to 10000000 < data_size and data_size <= 50000000.
Now let's practice this technique by creating a frequency table for the rating_count_tot column, which describes the total number of user ratings an app has received.
The exercise below has a bit less guidance compared to previous ones. If you get stuck at any point, begin by reviewing the relevant parts of the material we've covered so far. Getting stuck is part of the process, so don't get discouraged, and focus instead on finding a solution.
Instructions
Begin by finding the minimum and maximum value in the rating_count_tot column.
Extract the values in the rating_count_tot column (index number 5) in a separate list (don't forget to convert to integer or float).
Find out the minimum and maximum value of that list using the min() and the max() commands.
Based on the minimum and maximum value you've found, choose a few intervals (try to choose five intervals or less).
We've disabled answer checking for this exercise to give you the freedom to choose the intervals you find suitable (there's not a fixed solution for this exercise). You can see the intervals we chose in the solution.
Once you've chosen the intervals, compute the frequency of apps for each interval. Store the frequency table in a dictionary.
Create a dictionary with intervals as dictionary keys and zeros as dictionary values.
Loop through the apps_data data set. Count the frequency of each interval using an if statement followed by a series of elif clauses.
Inspect the frequency table and analyze the results.
Jawaban & Hasil :


Komentar
Posting Komentar