Context Our world population is expected to grow from 7.3 billion today to 9.7 billion in the year 2050. Finding solutions for feeding the growing world population has become a hot topic for food and agriculture organizations, entrepreneurs and philanthropists. These solutions range from changing the way we grow our food to changing the way we eat. To make things harder, the world's climate is changing and it is both affecting and affected by the way we grow our food – agriculture. This dataset provides an insight on our worldwide food production - focusing on a comparison between food produced for human consumption and feed produced for animals.
Content The Food and Agriculture Organization of the United Nations provides free access to food and agriculture data for over 245 countries and territories, from the year 1961 to the most recent update (depends on the dataset). One dataset from the FAO's database is the Food Balance Sheets. It presents a comprehensive picture of the pattern of a country's food supply during a specified reference period, the last time an update was loaded to the FAO database was in 2013. The food balance sheet shows for each food item the sources of supply and its utilization. This chunk of the dataset is focused on two utilizations of each food item available:
Food - refers to the total amount of the food item available as human food during the reference period. Feed - refers to the quantity of the food item available for feeding to the livestock and poultry during the reference period. Dataset's attributes:
Area code - Country name abbreviation Area - County name Item - Food item Element - Food or Feed Latitude - geographic coordinate that specifies the north–south position of a point on the Earth's surface Longitude - geographic coordinate that specifies the east-west position of a point on the Earth's surface Production per year - Amount of food item produced in 1000 tonnes
This is a simple exploratory notebook that heavily expolits pandas and seaborn
# Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# importing data
df = pd.read_csv("FAO.csv", encoding = "ISO-8859-1")
pd.options.mode.chained_assignment = None
from sklearn.linear_model import LinearRegression
df
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AFG | 2 | Afghanistan | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AFG | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AFG | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AFG | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AFG | 2 | Afghanistan | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
5 | AFG | 2 | Afghanistan | 2514 | Maize and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 231.0 | 67.0 | 82.0 | 67.0 | 69.0 | 71.0 | 82.0 | 73.0 | 77 | 76 |
6 | AFG | 2 | Afghanistan | 2517 | Millet and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 15.0 | 21.0 | 11.0 | 19.0 | 21.0 | 18.0 | 14.0 | 14.0 | 14 | 12 |
7 | AFG | 2 | Afghanistan | 2520 | Cereals, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
8 | AFG | 2 | Afghanistan | 2531 | Potatoes and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 276.0 | 294.0 | 294.0 | 260.0 | 242.0 | 250.0 | 192.0 | 169.0 | 196 | 230 |
9 | AFG | 2 | Afghanistan | 2536 | Sugar cane | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 50.0 | 29.0 | 61.0 | 65.0 | 54.0 | 114.0 | 83.0 | 83.0 | 69 | 81 |
10 | AFG | 2 | Afghanistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
11 | AFG | 2 | Afghanistan | 2542 | Sugar (Raw Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 124.0 | 152.0 | 169.0 | 192.0 | 217.0 | 231.0 | 240.0 | 240.0 | 250 | 255 |
12 | AFG | 2 | Afghanistan | 2543 | Sweeteners, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 9.0 | 15.0 | 12.0 | 6.0 | 11.0 | 2.0 | 9.0 | 21.0 | 24 | 16 |
13 | AFG | 2 | Afghanistan | 2745 | Honey | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 2.0 | 2 | 2 |
14 | AFG | 2 | Afghanistan | 2549 | Pulses, Other and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 3.0 | 2.0 | 3.0 | 3.0 | 3.0 | 5.0 | 4.0 | 5.0 | 4 | 4 |
15 | AFG | 2 | Afghanistan | 2549 | Pulses, Other and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 17.0 | 35.0 | 37.0 | 40.0 | 54.0 | 80.0 | 66.0 | 81.0 | 63 | 74 |
16 | AFG | 2 | Afghanistan | 2551 | Nuts and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 11.0 | 13.0 | 24.0 | 34.0 | 42.0 | 28.0 | 66.0 | 71.0 | 70 | 44 |
17 | AFG | 2 | Afghanistan | 2560 | Coconuts - Incl Copra | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
18 | AFG | 2 | Afghanistan | 2561 | Sesame seed | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 16.0 | 16.0 | 13.0 | 16.0 | 16.0 | 16.0 | 19.0 | 17.0 | 16 | 16 |
19 | AFG | 2 | Afghanistan | 2563 | Olives (including preserved) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 2.0 | 2 | 2 |
20 | AFG | 2 | Afghanistan | 2571 | Soyabean Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 6.0 | 35.0 | 18.0 | 21.0 | 11.0 | 6.0 | 15.0 | 16.0 | 16 | 16 |
21 | AFG | 2 | Afghanistan | 2572 | Groundnut Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
22 | AFG | 2 | Afghanistan | 2573 | Sunflowerseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 4.0 | 6.0 | 5.0 | 9.0 | 3.0 | 8.0 | 15.0 | 16.0 | 17 | 23 |
23 | AFG | 2 | Afghanistan | 2574 | Rape and Mustard Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 1.0 | 3.0 | 5.0 | 6.0 | 6.0 | 1.0 | 2.0 | 2 | 2 |
24 | AFG | 2 | Afghanistan | 2575 | Cottonseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 3.0 | 3.0 | 3 | 4 |
25 | AFG | 2 | Afghanistan | 2577 | Palm Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 71.0 | 69.0 | 56.0 | 51.0 | 36.0 | 53.0 | 59.0 | 51.0 | 61 | 64 |
26 | AFG | 2 | Afghanistan | 2579 | Sesameseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 2.0 | 1 | 1 |
27 | AFG | 2 | Afghanistan | 2580 | Olive Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
28 | AFG | 2 | Afghanistan | 2586 | Oilcrops Oil, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 1.0 | 2.0 | 2.0 | 2 | 2 |
29 | AFG | 2 | Afghanistan | 2601 | Tomatoes and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 2.0 | 8.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
21447 | ZWE | 181 | Zimbabwe | 2765 | Crustaceans | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21448 | ZWE | 181 | Zimbabwe | 2766 | Cephalopods | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21449 | ZWE | 181 | Zimbabwe | 2767 | Molluscs, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0 | 0 |
21450 | ZWE | 181 | Zimbabwe | 2775 | Aquatic Plants | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21451 | ZWE | 181 | Zimbabwe | 2680 | Infant food | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21452 | ZWE | 181 | Zimbabwe | 2905 | Cereals - Excluding Beer | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 75.0 | 54.0 | 75.0 | 55.0 | 63.0 | 62.0 | 55.0 | 55.0 | 55 | 55 |
21453 | ZWE | 181 | Zimbabwe | 2905 | Cereals - Excluding Beer | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 1844.0 | 1842.0 | 1944.0 | 1962.0 | 1918.0 | 1980.0 | 2011.0 | 2094.0 | 2071 | 2016 |
21454 | ZWE | 181 | Zimbabwe | 2907 | Starchy Roots | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 223.0 | 236.0 | 238.0 | 228.0 | 245.0 | 258.0 | 258.0 | 269.0 | 272 | 276 |
21455 | ZWE | 181 | Zimbabwe | 2908 | Sugar Crops | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21456 | ZWE | 181 | Zimbabwe | 2909 | Sugar & Sweeteners | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 335.0 | 313.0 | 339.0 | 302.0 | 285.0 | 287.0 | 314.0 | 336.0 | 396 | 416 |
21457 | ZWE | 181 | Zimbabwe | 2911 | Pulses | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 63.0 | 59.0 | 61.0 | 57.0 | 69.0 | 78.0 | 68.0 | 56.0 | 52 | 55 |
21458 | ZWE | 181 | Zimbabwe | 2912 | Treenuts | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 1.0 | 2.0 | 1.0 | 2.0 | 2.0 | 3.0 | 4.0 | 2.0 | 4 | 3 |
21459 | ZWE | 181 | Zimbabwe | 2913 | Oilcrops | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 36.0 | 46.0 | 41.0 | 33.0 | 31.0 | 19.0 | 24.0 | 17.0 | 27 | 30 |
21460 | ZWE | 181 | Zimbabwe | 2913 | Oilcrops | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 60.0 | 59.0 | 61.0 | 62.0 | 48.0 | 44.0 | 41.0 | 40.0 | 38 | 38 |
21461 | ZWE | 181 | Zimbabwe | 2914 | Vegetable Oils | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 111.0 | 114.0 | 112.0 | 114.0 | 134.0 | 135.0 | 137.0 | 147.0 | 159 | 160 |
21462 | ZWE | 181 | Zimbabwe | 2918 | Vegetables | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 161.0 | 166.0 | 208.0 | 185.0 | 137.0 | 179.0 | 215.0 | 217.0 | 227 | 227 |
21463 | ZWE | 181 | Zimbabwe | 2919 | Fruits - Excluding Wine | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 191.0 | 134.0 | 167.0 | 177.0 | 185.0 | 184.0 | 211.0 | 230.0 | 246 | 217 |
21464 | ZWE | 181 | Zimbabwe | 2922 | Stimulants | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 7.0 | 21.0 | 14.0 | 24.0 | 16.0 | 11.0 | 23.0 | 11.0 | 10 | 10 |
21465 | ZWE | 181 | Zimbabwe | 2923 | Spices | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 7.0 | 11.0 | 7.0 | 12.0 | 16.0 | 16.0 | 14.0 | 11.0 | 12 | 12 |
21466 | ZWE | 181 | Zimbabwe | 2924 | Alcoholic Beverages | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 294.0 | 290.0 | 316.0 | 355.0 | 398.0 | 437.0 | 448.0 | 476.0 | 525 | 516 |
21467 | ZWE | 181 | Zimbabwe | 2943 | Meat | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 222.0 | 228.0 | 233.0 | 238.0 | 242.0 | 265.0 | 262.0 | 277.0 | 280 | 258 |
21468 | ZWE | 181 | Zimbabwe | 2945 | Offals | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 20.0 | 20.0 | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 22 | 22 |
21469 | ZWE | 181 | Zimbabwe | 2946 | Animal fats | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 26.0 | 26.0 | 29.0 | 29.0 | 27.0 | 31.0 | 30.0 | 25.0 | 26 | 20 |
21470 | ZWE | 181 | Zimbabwe | 2949 | Eggs | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 15.0 | 18.0 | 18.0 | 21.0 | 22.0 | 27.0 | 27.0 | 24.0 | 24 | 25 |
21471 | ZWE | 181 | Zimbabwe | 2948 | Milk - Excluding Butter | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 23.0 | 25.0 | 25.0 | 30 | 31 |
21472 | ZWE | 181 | Zimbabwe | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 373.0 | 357.0 | 359.0 | 356.0 | 341.0 | 385.0 | 418.0 | 457.0 | 426 | 451 |
21473 | ZWE | 181 | Zimbabwe | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 5.0 | 4.0 | 9.0 | 6.0 | 9.0 | 5.0 | 15.0 | 15.0 | 15 | 15 |
21474 | ZWE | 181 | Zimbabwe | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 18.0 | 14.0 | 17.0 | 14.0 | 15.0 | 18.0 | 29.0 | 40.0 | 40 | 40 |
21475 | ZWE | 181 | Zimbabwe | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21476 | ZWE | 181 | Zimbabwe | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21477 rows × 63 columns
Let's see what the data looks like...
df
Area Abbreviation | Area Code | Area | Item Code | Item | Element Code | Element | Unit | latitude | longitude | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AFG | 2 | Afghanistan | 2511 | Wheat and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3249.0 | 3486.0 | 3704.0 | 4164.0 | 4252.0 | 4538.0 | 4605.0 | 4711.0 | 4810 | 4895 |
1 | AFG | 2 | Afghanistan | 2805 | Rice (Milled Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 419.0 | 445.0 | 546.0 | 455.0 | 490.0 | 415.0 | 442.0 | 476.0 | 425 | 422 |
2 | AFG | 2 | Afghanistan | 2513 | Barley and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 58.0 | 236.0 | 262.0 | 263.0 | 230.0 | 379.0 | 315.0 | 203.0 | 367 | 360 |
3 | AFG | 2 | Afghanistan | 2513 | Barley and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 185.0 | 43.0 | 44.0 | 48.0 | 62.0 | 55.0 | 60.0 | 72.0 | 78 | 89 |
4 | AFG | 2 | Afghanistan | 2514 | Maize and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 120.0 | 208.0 | 233.0 | 249.0 | 247.0 | 195.0 | 178.0 | 191.0 | 200 | 200 |
5 | AFG | 2 | Afghanistan | 2514 | Maize and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 231.0 | 67.0 | 82.0 | 67.0 | 69.0 | 71.0 | 82.0 | 73.0 | 77 | 76 |
6 | AFG | 2 | Afghanistan | 2517 | Millet and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 15.0 | 21.0 | 11.0 | 19.0 | 21.0 | 18.0 | 14.0 | 14.0 | 14 | 12 |
7 | AFG | 2 | Afghanistan | 2520 | Cereals, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
8 | AFG | 2 | Afghanistan | 2531 | Potatoes and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 276.0 | 294.0 | 294.0 | 260.0 | 242.0 | 250.0 | 192.0 | 169.0 | 196 | 230 |
9 | AFG | 2 | Afghanistan | 2536 | Sugar cane | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 50.0 | 29.0 | 61.0 | 65.0 | 54.0 | 114.0 | 83.0 | 83.0 | 69 | 81 |
10 | AFG | 2 | Afghanistan | 2537 | Sugar beet | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
11 | AFG | 2 | Afghanistan | 2542 | Sugar (Raw Equivalent) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 124.0 | 152.0 | 169.0 | 192.0 | 217.0 | 231.0 | 240.0 | 240.0 | 250 | 255 |
12 | AFG | 2 | Afghanistan | 2543 | Sweeteners, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 9.0 | 15.0 | 12.0 | 6.0 | 11.0 | 2.0 | 9.0 | 21.0 | 24 | 16 |
13 | AFG | 2 | Afghanistan | 2745 | Honey | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 2.0 | 2 | 2 |
14 | AFG | 2 | Afghanistan | 2549 | Pulses, Other and products | 5521 | Feed | 1000 tonnes | 33.94 | 67.71 | ... | 3.0 | 2.0 | 3.0 | 3.0 | 3.0 | 5.0 | 4.0 | 5.0 | 4 | 4 |
15 | AFG | 2 | Afghanistan | 2549 | Pulses, Other and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 17.0 | 35.0 | 37.0 | 40.0 | 54.0 | 80.0 | 66.0 | 81.0 | 63 | 74 |
16 | AFG | 2 | Afghanistan | 2551 | Nuts and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 11.0 | 13.0 | 24.0 | 34.0 | 42.0 | 28.0 | 66.0 | 71.0 | 70 | 44 |
17 | AFG | 2 | Afghanistan | 2560 | Coconuts - Incl Copra | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
18 | AFG | 2 | Afghanistan | 2561 | Sesame seed | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 16.0 | 16.0 | 13.0 | 16.0 | 16.0 | 16.0 | 19.0 | 17.0 | 16 | 16 |
19 | AFG | 2 | Afghanistan | 2563 | Olives (including preserved) | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 2.0 | 2 | 2 |
20 | AFG | 2 | Afghanistan | 2571 | Soyabean Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 6.0 | 35.0 | 18.0 | 21.0 | 11.0 | 6.0 | 15.0 | 16.0 | 16 | 16 |
21 | AFG | 2 | Afghanistan | 2572 | Groundnut Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
22 | AFG | 2 | Afghanistan | 2573 | Sunflowerseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 4.0 | 6.0 | 5.0 | 9.0 | 3.0 | 8.0 | 15.0 | 16.0 | 17 | 23 |
23 | AFG | 2 | Afghanistan | 2574 | Rape and Mustard Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 1.0 | 3.0 | 5.0 | 6.0 | 6.0 | 1.0 | 2.0 | 2 | 2 |
24 | AFG | 2 | Afghanistan | 2575 | Cottonseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 3.0 | 3.0 | 3 | 4 |
25 | AFG | 2 | Afghanistan | 2577 | Palm Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 71.0 | 69.0 | 56.0 | 51.0 | 36.0 | 53.0 | 59.0 | 51.0 | 61 | 64 |
26 | AFG | 2 | Afghanistan | 2579 | Sesameseed Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 2.0 | 1 | 1 |
27 | AFG | 2 | Afghanistan | 2580 | Olive Oil | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
28 | AFG | 2 | Afghanistan | 2586 | Oilcrops Oil, Other | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 1.0 | 2.0 | 2.0 | 2 | 2 |
29 | AFG | 2 | Afghanistan | 2601 | Tomatoes and products | 5142 | Food | 1000 tonnes | 33.94 | 67.71 | ... | 2.0 | 2.0 | 8.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
21447 | ZWE | 181 | Zimbabwe | 2765 | Crustaceans | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21448 | ZWE | 181 | Zimbabwe | 2766 | Cephalopods | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21449 | ZWE | 181 | Zimbabwe | 2767 | Molluscs, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0 | 0 |
21450 | ZWE | 181 | Zimbabwe | 2775 | Aquatic Plants | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21451 | ZWE | 181 | Zimbabwe | 2680 | Infant food | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21452 | ZWE | 181 | Zimbabwe | 2905 | Cereals - Excluding Beer | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 75.0 | 54.0 | 75.0 | 55.0 | 63.0 | 62.0 | 55.0 | 55.0 | 55 | 55 |
21453 | ZWE | 181 | Zimbabwe | 2905 | Cereals - Excluding Beer | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 1844.0 | 1842.0 | 1944.0 | 1962.0 | 1918.0 | 1980.0 | 2011.0 | 2094.0 | 2071 | 2016 |
21454 | ZWE | 181 | Zimbabwe | 2907 | Starchy Roots | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 223.0 | 236.0 | 238.0 | 228.0 | 245.0 | 258.0 | 258.0 | 269.0 | 272 | 276 |
21455 | ZWE | 181 | Zimbabwe | 2908 | Sugar Crops | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21456 | ZWE | 181 | Zimbabwe | 2909 | Sugar & Sweeteners | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 335.0 | 313.0 | 339.0 | 302.0 | 285.0 | 287.0 | 314.0 | 336.0 | 396 | 416 |
21457 | ZWE | 181 | Zimbabwe | 2911 | Pulses | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 63.0 | 59.0 | 61.0 | 57.0 | 69.0 | 78.0 | 68.0 | 56.0 | 52 | 55 |
21458 | ZWE | 181 | Zimbabwe | 2912 | Treenuts | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 1.0 | 2.0 | 1.0 | 2.0 | 2.0 | 3.0 | 4.0 | 2.0 | 4 | 3 |
21459 | ZWE | 181 | Zimbabwe | 2913 | Oilcrops | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 36.0 | 46.0 | 41.0 | 33.0 | 31.0 | 19.0 | 24.0 | 17.0 | 27 | 30 |
21460 | ZWE | 181 | Zimbabwe | 2913 | Oilcrops | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 60.0 | 59.0 | 61.0 | 62.0 | 48.0 | 44.0 | 41.0 | 40.0 | 38 | 38 |
21461 | ZWE | 181 | Zimbabwe | 2914 | Vegetable Oils | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 111.0 | 114.0 | 112.0 | 114.0 | 134.0 | 135.0 | 137.0 | 147.0 | 159 | 160 |
21462 | ZWE | 181 | Zimbabwe | 2918 | Vegetables | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 161.0 | 166.0 | 208.0 | 185.0 | 137.0 | 179.0 | 215.0 | 217.0 | 227 | 227 |
21463 | ZWE | 181 | Zimbabwe | 2919 | Fruits - Excluding Wine | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 191.0 | 134.0 | 167.0 | 177.0 | 185.0 | 184.0 | 211.0 | 230.0 | 246 | 217 |
21464 | ZWE | 181 | Zimbabwe | 2922 | Stimulants | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 7.0 | 21.0 | 14.0 | 24.0 | 16.0 | 11.0 | 23.0 | 11.0 | 10 | 10 |
21465 | ZWE | 181 | Zimbabwe | 2923 | Spices | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 7.0 | 11.0 | 7.0 | 12.0 | 16.0 | 16.0 | 14.0 | 11.0 | 12 | 12 |
21466 | ZWE | 181 | Zimbabwe | 2924 | Alcoholic Beverages | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 294.0 | 290.0 | 316.0 | 355.0 | 398.0 | 437.0 | 448.0 | 476.0 | 525 | 516 |
21467 | ZWE | 181 | Zimbabwe | 2943 | Meat | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 222.0 | 228.0 | 233.0 | 238.0 | 242.0 | 265.0 | 262.0 | 277.0 | 280 | 258 |
21468 | ZWE | 181 | Zimbabwe | 2945 | Offals | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 20.0 | 20.0 | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 22 | 22 |
21469 | ZWE | 181 | Zimbabwe | 2946 | Animal fats | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 26.0 | 26.0 | 29.0 | 29.0 | 27.0 | 31.0 | 30.0 | 25.0 | 26 | 20 |
21470 | ZWE | 181 | Zimbabwe | 2949 | Eggs | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 15.0 | 18.0 | 18.0 | 21.0 | 22.0 | 27.0 | 27.0 | 24.0 | 24 | 25 |
21471 | ZWE | 181 | Zimbabwe | 2948 | Milk - Excluding Butter | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 21.0 | 21.0 | 21.0 | 21.0 | 21.0 | 23.0 | 25.0 | 25.0 | 30 | 31 |
21472 | ZWE | 181 | Zimbabwe | 2948 | Milk - Excluding Butter | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 373.0 | 357.0 | 359.0 | 356.0 | 341.0 | 385.0 | 418.0 | 457.0 | 426 | 451 |
21473 | ZWE | 181 | Zimbabwe | 2960 | Fish, Seafood | 5521 | Feed | 1000 tonnes | -19.02 | 29.15 | ... | 5.0 | 4.0 | 9.0 | 6.0 | 9.0 | 5.0 | 15.0 | 15.0 | 15 | 15 |
21474 | ZWE | 181 | Zimbabwe | 2960 | Fish, Seafood | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 18.0 | 14.0 | 17.0 | 14.0 | 15.0 | 18.0 | 29.0 | 40.0 | 40 | 40 |
21475 | ZWE | 181 | Zimbabwe | 2961 | Aquatic Products, Other | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21476 | ZWE | 181 | Zimbabwe | 2928 | Miscellaneous | 5142 | Food | 1000 tonnes | -19.02 | 29.15 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
21477 rows × 63 columns
area_list = list(df['Area'].unique())
year_list = list(df.iloc[:,10:].columns)
plt.figure(figsize=(24,12))
for ar in area_list:
yearly_produce = []
for yr in year_list:
yearly_produce.append(df[yr][df['Area'] == ar].sum())
plt.plot(yearly_produce, label=ar)
plt.xticks(np.arange(53), tuple(year_list), rotation=60)
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3, ncol=8, mode="expand", borderaxespad=0.)
plt.savefig('p.png')
plt.show()
plt.figure(figsize=(24,12))
<Figure size 1728x864 with 0 Axes>
<Figure size 1728x864 with 0 Axes>
Clearly, China, India and US stand out here. So, these are the countries with most food and feed production.
Now, let's have a close look at their food and feed data
sns.factorplot("Element", data=df, kind="count")
plt.show()
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
So, there is a huge difference in food and feed production. Now, we have obvious assumptions about the following plots after looking at this huge difference.
sns.factorplot("Area", data=df[(df['Area'] == "India") | (df['Area'] == "China, mainland") | (df['Area'] == "United States of America")], kind="count", hue="Element", size=8, aspect=.8)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)
<seaborn.axisgrid.FacetGrid at 0x1a218d2550>
Though, there is a huge difference between feed and food production, these countries' total production and their ranks depend on feed production.
Now, we create a dataframe with countries as index and their annual produce as columns from 1961 to 2013.
new_df_dict = {}
for ar in area_list:
yearly_produce = []
for yr in year_list:
yearly_produce.append(df[yr][df['Area']==ar].sum())
new_df_dict[ar] = yearly_produce
new_df = pd.DataFrame(new_df_dict)
new_df.head()
Afghanistan | Albania | Algeria | Angola | Antigua and Barbuda | Argentina | Armenia | Australia | Austria | Azerbaijan | ... | United Republic of Tanzania | United States of America | Uruguay | Uzbekistan | Vanuatu | Venezuela (Bolivarian Republic of) | Viet Nam | Yemen | Zambia | Zimbabwe | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9481.0 | 1706.0 | 7488.0 | 4834.0 | 92.0 | 43402.0 | 0.0 | 25795.0 | 22542.0 | 0.0 | ... | 12367.0 | 559347.0 | 4631.0 | 0.0 | 97.0 | 9523.0 | 23856.0 | 2982.0 | 2976.0 | 3260.0 |
1 | 9414.0 | 1749.0 | 7235.0 | 4775.0 | 94.0 | 40784.0 | 0.0 | 27618.0 | 22627.0 | 0.0 | ... | 12810.0 | 556319.0 | 4448.0 | 0.0 | 101.0 | 9369.0 | 25220.0 | 3038.0 | 3057.0 | 3503.0 |
2 | 9194.0 | 1767.0 | 6861.0 | 5240.0 | 105.0 | 40219.0 | 0.0 | 28902.0 | 23637.0 | 0.0 | ... | 13109.0 | 552630.0 | 4682.0 | 0.0 | 103.0 | 9788.0 | 26053.0 | 3147.0 | 3069.0 | 3479.0 |
3 | 10170.0 | 1889.0 | 7255.0 | 5286.0 | 95.0 | 41638.0 | 0.0 | 29107.0 | 24099.0 | 0.0 | ... | 12965.0 | 555677.0 | 4723.0 | 0.0 | 102.0 | 10539.0 | 26377.0 | 3224.0 | 3121.0 | 3738.0 |
4 | 10473.0 | 1884.0 | 7509.0 | 5527.0 | 84.0 | 44936.0 | 0.0 | 28961.0 | 22664.0 | 0.0 | ... | 13742.0 | 589288.0 | 4581.0 | 0.0 | 107.0 | 10641.0 | 26961.0 | 3328.0 | 3236.0 | 3940.0 |
5 rows × 174 columns
Now, this is not perfect so we transpose this dataframe and add column names.
new_df = pd.DataFrame.transpose(new_df)
new_df.columns = year_list
new_df.head()
Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | Y1968 | Y1969 | Y1970 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Afghanistan | 9481.0 | 9414.0 | 9194.0 | 10170.0 | 10473.0 | 10169.0 | 11289.0 | 11508.0 | 11815.0 | 10454.0 | ... | 16542.0 | 17658.0 | 18317.0 | 19248.0 | 19381.0 | 20661.0 | 21030.0 | 21100.0 | 22706.0 | 23007.0 |
Albania | 1706.0 | 1749.0 | 1767.0 | 1889.0 | 1884.0 | 1995.0 | 2046.0 | 2169.0 | 2230.0 | 2395.0 | ... | 6637.0 | 6719.0 | 6911.0 | 6744.0 | 7168.0 | 7316.0 | 7907.0 | 8114.0 | 8221.0 | 8271.0 |
Algeria | 7488.0 | 7235.0 | 6861.0 | 7255.0 | 7509.0 | 7536.0 | 7986.0 | 8839.0 | 9003.0 | 9355.0 | ... | 48619.0 | 49562.0 | 51067.0 | 49933.0 | 50916.0 | 57505.0 | 60071.0 | 65852.0 | 69365.0 | 72161.0 |
Angola | 4834.0 | 4775.0 | 5240.0 | 5286.0 | 5527.0 | 5677.0 | 5833.0 | 5685.0 | 6219.0 | 6460.0 | ... | 25541.0 | 26696.0 | 28247.0 | 29877.0 | 32053.0 | 36985.0 | 38400.0 | 40573.0 | 38064.0 | 48639.0 |
Antigua and Barbuda | 92.0 | 94.0 | 105.0 | 95.0 | 84.0 | 73.0 | 64.0 | 59.0 | 68.0 | 77.0 | ... | 92.0 | 115.0 | 110.0 | 122.0 | 115.0 | 114.0 | 115.0 | 118.0 | 113.0 | 119.0 |
5 rows × 53 columns
Perfect! Now, we will do some feature engineering.
mean_produce = []
for i in range(174):
mean_produce.append(new_df.iloc[i,:].values.mean())
new_df['Mean_Produce'] = mean_produce
new_df['Rank'] = new_df['Mean_Produce'].rank(ascending=False)
new_df.head()
Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | Y1968 | Y1969 | Y1970 | ... | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | Mean_Produce | Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Afghanistan | 9481.0 | 9414.0 | 9194.0 | 10170.0 | 10473.0 | 10169.0 | 11289.0 | 11508.0 | 11815.0 | 10454.0 | ... | 18317.0 | 19248.0 | 19381.0 | 20661.0 | 21030.0 | 21100.0 | 22706.0 | 23007.0 | 13003.056604 | 69.0 |
Albania | 1706.0 | 1749.0 | 1767.0 | 1889.0 | 1884.0 | 1995.0 | 2046.0 | 2169.0 | 2230.0 | 2395.0 | ... | 6911.0 | 6744.0 | 7168.0 | 7316.0 | 7907.0 | 8114.0 | 8221.0 | 8271.0 | 4475.509434 | 104.0 |
Algeria | 7488.0 | 7235.0 | 6861.0 | 7255.0 | 7509.0 | 7536.0 | 7986.0 | 8839.0 | 9003.0 | 9355.0 | ... | 51067.0 | 49933.0 | 50916.0 | 57505.0 | 60071.0 | 65852.0 | 69365.0 | 72161.0 | 28879.490566 | 38.0 |
Angola | 4834.0 | 4775.0 | 5240.0 | 5286.0 | 5527.0 | 5677.0 | 5833.0 | 5685.0 | 6219.0 | 6460.0 | ... | 28247.0 | 29877.0 | 32053.0 | 36985.0 | 38400.0 | 40573.0 | 38064.0 | 48639.0 | 13321.056604 | 68.0 |
Antigua and Barbuda | 92.0 | 94.0 | 105.0 | 95.0 | 84.0 | 73.0 | 64.0 | 59.0 | 68.0 | 77.0 | ... | 110.0 | 122.0 | 115.0 | 114.0 | 115.0 | 118.0 | 113.0 | 119.0 | 83.886792 | 172.0 |
5 rows × 55 columns
Now, we create another dataframe with items and their total production each year from 1961 to 2013
item_list = list(df['Item'].unique())
item_df = pd.DataFrame()
item_df['Item_Name'] = item_list
for yr in year_list:
item_produce = []
for it in item_list:
item_produce.append(df[yr][df['Item']==it].sum())
item_df[yr] = item_produce
item_df.head()
Item_Name | Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | Y1968 | Y1969 | ... | Y2004 | Y2005 | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Wheat and products | 138829.0 | 144643.0 | 147325.0 | 156273.0 | 168822.0 | 169832.0 | 171469.0 | 179530.0 | 189658.0 | ... | 527394.0 | 532263.0 | 537279.0 | 529271.0 | 562239.0 | 557245.0 | 549926.0 | 578179.0 | 576597 | 587492 |
1 | Rice (Milled Equivalent) | 122700.0 | 131842.0 | 139507.0 | 148304.0 | 150056.0 | 155583.0 | 158587.0 | 164614.0 | 167922.0 | ... | 361107.0 | 366025.0 | 372629.0 | 378698.0 | 389708.0 | 394221.0 | 398559.0 | 404152.0 | 406787 | 410880 |
2 | Barley and products | 46180.0 | 48915.0 | 51642.0 | 54184.0 | 54945.0 | 55463.0 | 56424.0 | 60455.0 | 65501.0 | ... | 102055.0 | 97185.0 | 100981.0 | 93310.0 | 98209.0 | 99135.0 | 92563.0 | 92570.0 | 88766 | 99452 |
3 | Maize and products | 168039.0 | 168305.0 | 172905.0 | 175468.0 | 190304.0 | 200860.0 | 213050.0 | 215613.0 | 221953.0 | ... | 545024.0 | 549036.0 | 543280.0 | 573892.0 | 592231.0 | 557940.0 | 584337.0 | 603297.0 | 608730 | 671300 |
4 | Millet and products | 19075.0 | 19019.0 | 19740.0 | 20353.0 | 18377.0 | 20860.0 | 22997.0 | 21785.0 | 23966.0 | ... | 25789.0 | 25496.0 | 25997.0 | 26750.0 | 26373.0 | 24575.0 | 27039.0 | 25740.0 | 26105 | 26346 |
5 rows × 54 columns
This time, we will use the new features to get some good conclusions.
sum_col = []
for i in range(115):
sum_col.append(item_df.iloc[i,1:].values.sum())
item_df['Sum'] = sum_col
item_df['Production_Rank'] = item_df['Sum'].rank(ascending=False)
item_df.head()
Item_Name | Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | Y1968 | Y1969 | ... | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | Sum | Production_Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Wheat and products | 138829.0 | 144643.0 | 147325.0 | 156273.0 | 168822.0 | 169832.0 | 171469.0 | 179530.0 | 189658.0 | ... | 537279.0 | 529271.0 | 562239.0 | 557245.0 | 549926.0 | 578179.0 | 576597 | 587492 | 19194671.0 | 6.0 |
1 | Rice (Milled Equivalent) | 122700.0 | 131842.0 | 139507.0 | 148304.0 | 150056.0 | 155583.0 | 158587.0 | 164614.0 | 167922.0 | ... | 372629.0 | 378698.0 | 389708.0 | 394221.0 | 398559.0 | 404152.0 | 406787 | 410880 | 14475448.0 | 8.0 |
2 | Barley and products | 46180.0 | 48915.0 | 51642.0 | 54184.0 | 54945.0 | 55463.0 | 56424.0 | 60455.0 | 65501.0 | ... | 100981.0 | 93310.0 | 98209.0 | 99135.0 | 92563.0 | 92570.0 | 88766 | 99452 | 4442742.0 | 20.0 |
3 | Maize and products | 168039.0 | 168305.0 | 172905.0 | 175468.0 | 190304.0 | 200860.0 | 213050.0 | 215613.0 | 221953.0 | ... | 543280.0 | 573892.0 | 592231.0 | 557940.0 | 584337.0 | 603297.0 | 608730 | 671300 | 19960640.0 | 5.0 |
4 | Millet and products | 19075.0 | 19019.0 | 19740.0 | 20353.0 | 18377.0 | 20860.0 | 22997.0 | 21785.0 | 23966.0 | ... | 25997.0 | 26750.0 | 26373.0 | 24575.0 | 27039.0 | 25740.0 | 26105 | 26346 | 1225400.0 | 38.0 |
5 rows × 56 columns
item_df['Item_Name'][item_df['Production_Rank'] < 11.0].sort_values()
56 Cereals - Excluding Beer 65 Fruits - Excluding Wine 3 Maize and products 53 Milk - Excluding Butter 6 Potatoes and products 1 Rice (Milled Equivalent) 57 Starchy Roots 64 Vegetables 27 Vegetables, Other 0 Wheat and products Name: Item_Name, dtype: object
So, cereals, fruits and maize are the most produced items in the last 50 years
sns.factorplot("Item", data=df[(df['Item']=='Wheat and products') | (df['Item']=='Rice (Milled Equivalent)') | (df['Item']=='Maize and products') | (df['Item']=='Potatoes and products') | (df['Item']=='Vegetables, Other') | (df['Item']=='Milk - Excluding Butter') | (df['Item']=='Cereals - Excluding Beer') | (df['Item']=='Starchy Roots') | (df['Item']=='Vegetables') | (df['Item']=='Fruits - Excluding Wine')], kind="count", hue="Element", size=20, aspect=.8)
plt.show()
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)
year_df = df.iloc[:,10:]
fig, ax = plt.subplots(figsize=(16,10))
sns.heatmap(year_df.corr(), ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a23b4b128>
So, we gather that a given year's production is more similar to its immediate previous and immediate following years.
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10,10))
ax1.set(xlabel='Y1968', ylabel='Y1961')
ax2.set(xlabel='Y1968', ylabel='Y1963')
ax3.set(xlabel='Y1968', ylabel='Y1986')
ax4.set(xlabel='Y1968', ylabel='Y2013')
sns.jointplot(x="Y1968", y="Y1961", data=df, kind="reg", ax=ax1)
sns.jointplot(x="Y1968", y="Y1963", data=df, kind="reg", ax=ax2)
sns.jointplot(x="Y1968", y="Y1986", data=df, kind="reg", ax=ax3)
sns.jointplot(x="Y1968", y="Y2013", data=df, kind="reg", ax=ax4)
plt.close(2)
plt.close(3)
plt.close(4)
plt.close(5)
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
This will detect the items whose production has drastically increased over the years
new_item_df = item_df.drop(["Item_Name","Sum","Production_Rank"], axis = 1)
fig, ax = plt.subplots(figsize=(12,24))
sns.heatmap(new_item_df,ax=ax)
ax.set_yticklabels(item_df.Item_Name.values[::-1])
plt.show()
There is considerable growth in production of Palmkernel oil, Meat/Aquatic animals, ricebran oil, cottonseed, seafood, offals, roots, poultry meat, mutton, bear, cocoa, coffee and soyabean oil. There has been exceptional growth in production of onions, cream, sugar crops, treenuts, butter/ghee and to some extent starchy roots.
Now, we look at clustering.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
This is the data we will use.
new_df.head()
Y1961 | Y1962 | Y1963 | Y1964 | Y1965 | Y1966 | Y1967 | Y1968 | Y1969 | Y1970 | ... | Y2006 | Y2007 | Y2008 | Y2009 | Y2010 | Y2011 | Y2012 | Y2013 | Mean_Produce | Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Afghanistan | 9481.0 | 9414.0 | 9194.0 | 10170.0 | 10473.0 | 10169.0 | 11289.0 | 11508.0 | 11815.0 | 10454.0 | ... | 18317.0 | 19248.0 | 19381.0 | 20661.0 | 21030.0 | 21100.0 | 22706.0 | 23007.0 | 13003.056604 | 69.0 |
Albania | 1706.0 | 1749.0 | 1767.0 | 1889.0 | 1884.0 | 1995.0 | 2046.0 | 2169.0 | 2230.0 | 2395.0 | ... | 6911.0 | 6744.0 | 7168.0 | 7316.0 | 7907.0 | 8114.0 | 8221.0 | 8271.0 | 4475.509434 | 104.0 |
Algeria | 7488.0 | 7235.0 | 6861.0 | 7255.0 | 7509.0 | 7536.0 | 7986.0 | 8839.0 | 9003.0 | 9355.0 | ... | 51067.0 | 49933.0 | 50916.0 | 57505.0 | 60071.0 | 65852.0 | 69365.0 | 72161.0 | 28879.490566 | 38.0 |
Angola | 4834.0 | 4775.0 | 5240.0 | 5286.0 | 5527.0 | 5677.0 | 5833.0 | 5685.0 | 6219.0 | 6460.0 | ... | 28247.0 | 29877.0 | 32053.0 | 36985.0 | 38400.0 | 40573.0 | 38064.0 | 48639.0 | 13321.056604 | 68.0 |
Antigua and Barbuda | 92.0 | 94.0 | 105.0 | 95.0 | 84.0 | 73.0 | 64.0 | 59.0 | 68.0 | 77.0 | ... | 110.0 | 122.0 | 115.0 | 114.0 | 115.0 | 118.0 | 113.0 | 119.0 | 83.886792 | 172.0 |
5 rows × 55 columns
X = new_df.iloc[:,:-2].values
X = pd.DataFrame(X)
X = X.convert_objects(convert_numeric=True)
X.columns = year_list
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: convert_objects is deprecated. To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
after removing the cwd from sys.path.
This method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data. More precisely, if one plots the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters is chosen at this point, hence the "elbow criterion". This "elbow" cannot always be unambiguously identified. Percentage of variance explained is the ratio of the between-group variance to the total variance, also known as an F-test. A slight variation of this method plots the curvature of the within group variance.
from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
kmeans = KMeans(n_clusters=i,init='k-means++',max_iter=300,n_init=10,random_state=0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1,11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
As the elbow corner coincides with x=2, we will have to form 2 clusters. Personally, I would have liked to select 3 to 4 clusters. But trust me, only selecting 2 clusters can lead to best results. Now, we apply k-means algorithm.
kmeans = KMeans(n_clusters=2,init='k-means++',max_iter=300,n_init=10,random_state=0)
y_kmeans = kmeans.fit_predict(X)
X = X.as_matrix(columns=None)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
after removing the cwd from sys.path.
Now, let's visualize the results.
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0,1],s=100,c='red',label='Others')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1,1],s=100,c='blue',label='China(mainland),USA,India')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300,c='yellow',label='Centroids')
plt.title('Clusters of countries by Productivity')
plt.legend()
plt.show()
So, the blue cluster represents China(Mainland), USA and India while the red cluster represents all the other countries. This result was highly probable. Just take a look at the plot of cell 3 above. See how China, USA and India stand out. That has been observed here in clustering too.
You should try this algorithm for 3 or 4 clusters. Looking at the distribution, you will realise why 2 clusters is the best choice for the given data
This is not the end! More is yet to come.
Now, lets try to predict the production using regression for 2020. We will predict the production for USA,India and Pakistan.
india_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
x=df[(df.Area=='India') & (df.Element=='Food')][i].mean()
india_list.append(x)
reset=[]
for i in year_list:
reset.append(int(i[1:]))
reset=np.array(reset)
reset=reset.reshape(-1,1)
india_list=np.array(india_list)
india_list=india_list.reshape(-1,1)
reg = LinearRegression()
reg.fit(reset,india_list)
predictions = reg.predict(reset)
plt.title("India")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,india_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))
df[(df.Area=='India') & (df.Element=='Food')]['Y1961'].mean()
df[(df.Area=='Pakistan') & (df.Element=='Food')]
Pak_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
yx=df[(df.Area=='Pakistan') & (df.Element=='Food')][i].mean()
Pak_list.append(yx)
Pak_list=np.array(Pak_list)
Pak_list=Pak_list.reshape(-1,1)
Pak_list
reg = LinearRegression()
reg.fit(reset,Pak_list)
predictions = reg.predict(reset)
plt.title("Pakistan")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,Pak_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))
usa_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
xu=df[(df.Area=='United States of America') & (df.Element=='Food')][i].mean()
usa_list.append(xu)
usa_list=np.array(usa_list)
usa_list=india_list.reshape(-1,1)
reg = LinearRegression()
reg.fit(reset,usa_list)
predictions = reg.predict(reset)
plt.title("USA")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,usa_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))
[0;31m---------------------------------------------------------------------------[0m
[0;31mValueError[0m Traceback (most recent call last)
[0;32m<ipython-input-24-da7cfa1c86d1>[0m in [0;36m<module>[0;34m[0m
[1;32m 27[0m [0mplt[0m[0;34m.[0m[0mplot[0m[0;34m([0m[0mreset[0m[0;34m,[0m[0mpredictions[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m 28[0m [0mplt[0m[0;34m.[0m[0mshow[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 29[0;31m [0mprint[0m[0;34m([0m[0mreg[0m[0;34m.[0m[0mpredict[0m[0;34m([0m[0;36m2020[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 30[0m [0;34m[0m[0m
[1;32m 31[0m [0mdf[0m[0;34m[[0m[0;34m([0m[0mdf[0m[0;34m.[0m[0mArea[0m[0;34m==[0m[0;34m'India'[0m[0;34m)[0m [0;34m&[0m [0;34m([0m[0mdf[0m[0;34m.[0m[0mElement[0m[0;34m==[0m[0;34m'Food'[0m[0;34m)[0m[0;34m][0m[0;34m[[0m[0;34m'Y1961'[0m[0;34m][0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py[0m in [0;36mpredict[0;34m(self, X)[0m
[1;32m 211[0m [0mReturns[0m [0mpredicted[0m [0mvalues[0m[0;34m.[0m[0;34m[0m[0;34m[0m[0m
[1;32m 212[0m """
[0;32m--> 213[0;31m [0;32mreturn[0m [0mself[0m[0;34m.[0m[0m_decision_function[0m[0;34m([0m[0mX[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 214[0m [0;34m[0m[0m
[1;32m 215[0m [0m_preprocess_data[0m [0;34m=[0m [0mstaticmethod[0m[0;34m([0m[0m_preprocess_data[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py[0m in [0;36m_decision_function[0;34m(self, X)[0m
[1;32m 194[0m [0mcheck_is_fitted[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m"coef_"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m 195[0m [0;34m[0m[0m
[0;32m--> 196[0;31m [0mX[0m [0;34m=[0m [0mcheck_array[0m[0;34m([0m[0mX[0m[0;34m,[0m [0maccept_sparse[0m[0;34m=[0m[0;34m[[0m[0;34m'csr'[0m[0;34m,[0m [0;34m'csc'[0m[0;34m,[0m [0;34m'coo'[0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 197[0m return safe_sparse_dot(X, self.coef_.T,
[1;32m 198[0m dense_output=True) + self.intercept_
[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py[0m in [0;36mcheck_array[0;34m(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)[0m
[1;32m 543[0m [0;34m"Reshape your data either using array.reshape(-1, 1) if "[0m[0;34m[0m[0;34m[0m[0m
[1;32m 544[0m [0;34m"your data has a single feature or array.reshape(1, -1) "[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 545[0;31m "if it contains a single sample.".format(array))
[0m[1;32m 546[0m [0;31m# If input is 1D raise error[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m 547[0m [0;32mif[0m [0marray[0m[0;34m.[0m[0mndim[0m [0;34m==[0m [0;36m1[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;31mValueError[0m: Expected 2D array, got scalar array instead:
array=2020.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.