About the dataset

Context Our world population is expected to grow from 7.3 billion today to 9.7 billion in the year 2050. Finding solutions for feeding the growing world population has become a hot topic for food and agriculture organizations, entrepreneurs and philanthropists. These solutions range from changing the way we grow our food to changing the way we eat. To make things harder, the world's climate is changing and it is both affecting and affected by the way we grow our food – agriculture. This dataset provides an insight on our worldwide food production - focusing on a comparison between food produced for human consumption and feed produced for animals.

Content The Food and Agriculture Organization of the United Nations provides free access to food and agriculture data for over 245 countries and territories, from the year 1961 to the most recent update (depends on the dataset). One dataset from the FAO's database is the Food Balance Sheets. It presents a comprehensive picture of the pattern of a country's food supply during a specified reference period, the last time an update was loaded to the FAO database was in 2013. The food balance sheet shows for each food item the sources of supply and its utilization. This chunk of the dataset is focused on two utilizations of each food item available:

Food - refers to the total amount of the food item available as human food during the reference period. Feed - refers to the quantity of the food item available for feeding to the livestock and poultry during the reference period. Dataset's attributes:

Area code - Country name abbreviation Area - County name Item - Food item Element - Food or Feed Latitude - geographic coordinate that specifies the north–south position of a point on the Earth's surface Longitude - geographic coordinate that specifies the east-west position of a point on the Earth's surface Production per year - Amount of food item produced in 1000 tonnes

This is a simple exploratory notebook that heavily expolits pandas and seaborn

# Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# importing data
df = pd.read_csv("FAO.csv",  encoding = "ISO-8859-1")
pd.options.mode.chained_assignment = None
from sklearn.linear_model import LinearRegression

df

	Area Abbreviation	Area Code	Area	Item Code	Item	Element Code	Element	Unit	latitude	longitude	...	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013
0	AFG	2	Afghanistan	2511	Wheat and products	5142	Food	1000 tonnes	33.94	67.71	...	3249.0	3486.0	3704.0	4164.0	4252.0	4538.0	4605.0	4711.0	4810	4895
1	AFG	2	Afghanistan	2805	Rice (Milled Equivalent)	5142	Food	1000 tonnes	33.94	67.71	...	419.0	445.0	546.0	455.0	490.0	415.0	442.0	476.0	425	422
2	AFG	2	Afghanistan	2513	Barley and products	5521	Feed	1000 tonnes	33.94	67.71	...	58.0	236.0	262.0	263.0	230.0	379.0	315.0	203.0	367	360
3	AFG	2	Afghanistan	2513	Barley and products	5142	Food	1000 tonnes	33.94	67.71	...	185.0	43.0	44.0	48.0	62.0	55.0	60.0	72.0	78	89
4	AFG	2	Afghanistan	2514	Maize and products	5521	Feed	1000 tonnes	33.94	67.71	...	120.0	208.0	233.0	249.0	247.0	195.0	178.0	191.0	200	200
5	AFG	2	Afghanistan	2514	Maize and products	5142	Food	1000 tonnes	33.94	67.71	...	231.0	67.0	82.0	67.0	69.0	71.0	82.0	73.0	77	76
6	AFG	2	Afghanistan	2517	Millet and products	5142	Food	1000 tonnes	33.94	67.71	...	15.0	21.0	11.0	19.0	21.0	18.0	14.0	14.0	14	12
7	AFG	2	Afghanistan	2520	Cereals, Other	5142	Food	1000 tonnes	33.94	67.71	...	2.0	1.0	1.0	0.0	0.0	0.0	0.0	0.0	0	0
8	AFG	2	Afghanistan	2531	Potatoes and products	5142	Food	1000 tonnes	33.94	67.71	...	276.0	294.0	294.0	260.0	242.0	250.0	192.0	169.0	196	230
9	AFG	2	Afghanistan	2536	Sugar cane	5521	Feed	1000 tonnes	33.94	67.71	...	50.0	29.0	61.0	65.0	54.0	114.0	83.0	83.0	69	81
10	AFG	2	Afghanistan	2537	Sugar beet	5521	Feed	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
11	AFG	2	Afghanistan	2542	Sugar (Raw Equivalent)	5142	Food	1000 tonnes	33.94	67.71	...	124.0	152.0	169.0	192.0	217.0	231.0	240.0	240.0	250	255
12	AFG	2	Afghanistan	2543	Sweeteners, Other	5142	Food	1000 tonnes	33.94	67.71	...	9.0	15.0	12.0	6.0	11.0	2.0	9.0	21.0	24	16
13	AFG	2	Afghanistan	2745	Honey	5142	Food	1000 tonnes	33.94	67.71	...	3.0	3.0	3.0	3.0	3.0	3.0	3.0	2.0	2	2
14	AFG	2	Afghanistan	2549	Pulses, Other and products	5521	Feed	1000 tonnes	33.94	67.71	...	3.0	2.0	3.0	3.0	3.0	5.0	4.0	5.0	4	4
15	AFG	2	Afghanistan	2549	Pulses, Other and products	5142	Food	1000 tonnes	33.94	67.71	...	17.0	35.0	37.0	40.0	54.0	80.0	66.0	81.0	63	74
16	AFG	2	Afghanistan	2551	Nuts and products	5142	Food	1000 tonnes	33.94	67.71	...	11.0	13.0	24.0	34.0	42.0	28.0	66.0	71.0	70	44
17	AFG	2	Afghanistan	2560	Coconuts - Incl Copra	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
18	AFG	2	Afghanistan	2561	Sesame seed	5142	Food	1000 tonnes	33.94	67.71	...	16.0	16.0	13.0	16.0	16.0	16.0	19.0	17.0	16	16
19	AFG	2	Afghanistan	2563	Olives (including preserved)	5142	Food	1000 tonnes	33.94	67.71	...	1.0	1.0	0.0	0.0	2.0	3.0	2.0	2.0	2	2
20	AFG	2	Afghanistan	2571	Soyabean Oil	5142	Food	1000 tonnes	33.94	67.71	...	6.0	35.0	18.0	21.0	11.0	6.0	15.0	16.0	16	16
21	AFG	2	Afghanistan	2572	Groundnut Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
22	AFG	2	Afghanistan	2573	Sunflowerseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	4.0	6.0	5.0	9.0	3.0	8.0	15.0	16.0	17	23
23	AFG	2	Afghanistan	2574	Rape and Mustard Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	1.0	3.0	5.0	6.0	6.0	1.0	2.0	2	2
24	AFG	2	Afghanistan	2575	Cottonseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	2.0	3.0	3.0	3.0	3.0	4.0	3.0	3.0	3	4
25	AFG	2	Afghanistan	2577	Palm Oil	5142	Food	1000 tonnes	33.94	67.71	...	71.0	69.0	56.0	51.0	36.0	53.0	59.0	51.0	61	64
26	AFG	2	Afghanistan	2579	Sesameseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	1.0	1.0	1.0	2.0	2.0	1.0	1.0	2.0	1	1
27	AFG	2	Afghanistan	2580	Olive Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	1.0	1.0	1.0	1	1
28	AFG	2	Afghanistan	2586	Oilcrops Oil, Other	5142	Food	1000 tonnes	33.94	67.71	...	0.0	1.0	0.0	0.0	3.0	1.0	2.0	2.0	2	2
29	AFG	2	Afghanistan	2601	Tomatoes and products	5142	Food	1000 tonnes	33.94	67.71	...	2.0	2.0	8.0	1.0	0.0	0.0	0.0	0.0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
21447	ZWE	181	Zimbabwe	2765	Crustaceans	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21448	ZWE	181	Zimbabwe	2766	Cephalopods	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21449	ZWE	181	Zimbabwe	2767	Molluscs, Other	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0	0
21450	ZWE	181	Zimbabwe	2775	Aquatic Plants	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21451	ZWE	181	Zimbabwe	2680	Infant food	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21452	ZWE	181	Zimbabwe	2905	Cereals - Excluding Beer	5521	Feed	1000 tonnes	-19.02	29.15	...	75.0	54.0	75.0	55.0	63.0	62.0	55.0	55.0	55	55
21453	ZWE	181	Zimbabwe	2905	Cereals - Excluding Beer	5142	Food	1000 tonnes	-19.02	29.15	...	1844.0	1842.0	1944.0	1962.0	1918.0	1980.0	2011.0	2094.0	2071	2016
21454	ZWE	181	Zimbabwe	2907	Starchy Roots	5142	Food	1000 tonnes	-19.02	29.15	...	223.0	236.0	238.0	228.0	245.0	258.0	258.0	269.0	272	276
21455	ZWE	181	Zimbabwe	2908	Sugar Crops	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21456	ZWE	181	Zimbabwe	2909	Sugar & Sweeteners	5142	Food	1000 tonnes	-19.02	29.15	...	335.0	313.0	339.0	302.0	285.0	287.0	314.0	336.0	396	416
21457	ZWE	181	Zimbabwe	2911	Pulses	5142	Food	1000 tonnes	-19.02	29.15	...	63.0	59.0	61.0	57.0	69.0	78.0	68.0	56.0	52	55
21458	ZWE	181	Zimbabwe	2912	Treenuts	5142	Food	1000 tonnes	-19.02	29.15	...	1.0	2.0	1.0	2.0	2.0	3.0	4.0	2.0	4	3
21459	ZWE	181	Zimbabwe	2913	Oilcrops	5521	Feed	1000 tonnes	-19.02	29.15	...	36.0	46.0	41.0	33.0	31.0	19.0	24.0	17.0	27	30
21460	ZWE	181	Zimbabwe	2913	Oilcrops	5142	Food	1000 tonnes	-19.02	29.15	...	60.0	59.0	61.0	62.0	48.0	44.0	41.0	40.0	38	38
21461	ZWE	181	Zimbabwe	2914	Vegetable Oils	5142	Food	1000 tonnes	-19.02	29.15	...	111.0	114.0	112.0	114.0	134.0	135.0	137.0	147.0	159	160
21462	ZWE	181	Zimbabwe	2918	Vegetables	5142	Food	1000 tonnes	-19.02	29.15	...	161.0	166.0	208.0	185.0	137.0	179.0	215.0	217.0	227	227
21463	ZWE	181	Zimbabwe	2919	Fruits - Excluding Wine	5142	Food	1000 tonnes	-19.02	29.15	...	191.0	134.0	167.0	177.0	185.0	184.0	211.0	230.0	246	217
21464	ZWE	181	Zimbabwe	2922	Stimulants	5142	Food	1000 tonnes	-19.02	29.15	...	7.0	21.0	14.0	24.0	16.0	11.0	23.0	11.0	10	10
21465	ZWE	181	Zimbabwe	2923	Spices	5142	Food	1000 tonnes	-19.02	29.15	...	7.0	11.0	7.0	12.0	16.0	16.0	14.0	11.0	12	12
21466	ZWE	181	Zimbabwe	2924	Alcoholic Beverages	5142	Food	1000 tonnes	-19.02	29.15	...	294.0	290.0	316.0	355.0	398.0	437.0	448.0	476.0	525	516
21467	ZWE	181	Zimbabwe	2943	Meat	5142	Food	1000 tonnes	-19.02	29.15	...	222.0	228.0	233.0	238.0	242.0	265.0	262.0	277.0	280	258
21468	ZWE	181	Zimbabwe	2945	Offals	5142	Food	1000 tonnes	-19.02	29.15	...	20.0	20.0	21.0	21.0	21.0	21.0	21.0	21.0	22	22
21469	ZWE	181	Zimbabwe	2946	Animal fats	5142	Food	1000 tonnes	-19.02	29.15	...	26.0	26.0	29.0	29.0	27.0	31.0	30.0	25.0	26	20
21470	ZWE	181	Zimbabwe	2949	Eggs	5142	Food	1000 tonnes	-19.02	29.15	...	15.0	18.0	18.0	21.0	22.0	27.0	27.0	24.0	24	25
21471	ZWE	181	Zimbabwe	2948	Milk - Excluding Butter	5521	Feed	1000 tonnes	-19.02	29.15	...	21.0	21.0	21.0	21.0	21.0	23.0	25.0	25.0	30	31
21472	ZWE	181	Zimbabwe	2948	Milk - Excluding Butter	5142	Food	1000 tonnes	-19.02	29.15	...	373.0	357.0	359.0	356.0	341.0	385.0	418.0	457.0	426	451
21473	ZWE	181	Zimbabwe	2960	Fish, Seafood	5521	Feed	1000 tonnes	-19.02	29.15	...	5.0	4.0	9.0	6.0	9.0	5.0	15.0	15.0	15	15
21474	ZWE	181	Zimbabwe	2960	Fish, Seafood	5142	Food	1000 tonnes	-19.02	29.15	...	18.0	14.0	17.0	14.0	15.0	18.0	29.0	40.0	40	40
21475	ZWE	181	Zimbabwe	2961	Aquatic Products, Other	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21476	ZWE	181	Zimbabwe	2928	Miscellaneous	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0

21477 rows × 63 columns

Let's see what the data looks like...

Plot for annual produce of different countries with quantity in y-axis and years in x-axis

df

	Area Abbreviation	Area Code	Area	Item Code	Item	Element Code	Element	Unit	latitude	longitude	...	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013
0	AFG	2	Afghanistan	2511	Wheat and products	5142	Food	1000 tonnes	33.94	67.71	...	3249.0	3486.0	3704.0	4164.0	4252.0	4538.0	4605.0	4711.0	4810	4895
1	AFG	2	Afghanistan	2805	Rice (Milled Equivalent)	5142	Food	1000 tonnes	33.94	67.71	...	419.0	445.0	546.0	455.0	490.0	415.0	442.0	476.0	425	422
2	AFG	2	Afghanistan	2513	Barley and products	5521	Feed	1000 tonnes	33.94	67.71	...	58.0	236.0	262.0	263.0	230.0	379.0	315.0	203.0	367	360
3	AFG	2	Afghanistan	2513	Barley and products	5142	Food	1000 tonnes	33.94	67.71	...	185.0	43.0	44.0	48.0	62.0	55.0	60.0	72.0	78	89
4	AFG	2	Afghanistan	2514	Maize and products	5521	Feed	1000 tonnes	33.94	67.71	...	120.0	208.0	233.0	249.0	247.0	195.0	178.0	191.0	200	200
5	AFG	2	Afghanistan	2514	Maize and products	5142	Food	1000 tonnes	33.94	67.71	...	231.0	67.0	82.0	67.0	69.0	71.0	82.0	73.0	77	76
6	AFG	2	Afghanistan	2517	Millet and products	5142	Food	1000 tonnes	33.94	67.71	...	15.0	21.0	11.0	19.0	21.0	18.0	14.0	14.0	14	12
7	AFG	2	Afghanistan	2520	Cereals, Other	5142	Food	1000 tonnes	33.94	67.71	...	2.0	1.0	1.0	0.0	0.0	0.0	0.0	0.0	0	0
8	AFG	2	Afghanistan	2531	Potatoes and products	5142	Food	1000 tonnes	33.94	67.71	...	276.0	294.0	294.0	260.0	242.0	250.0	192.0	169.0	196	230
9	AFG	2	Afghanistan	2536	Sugar cane	5521	Feed	1000 tonnes	33.94	67.71	...	50.0	29.0	61.0	65.0	54.0	114.0	83.0	83.0	69	81
10	AFG	2	Afghanistan	2537	Sugar beet	5521	Feed	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
11	AFG	2	Afghanistan	2542	Sugar (Raw Equivalent)	5142	Food	1000 tonnes	33.94	67.71	...	124.0	152.0	169.0	192.0	217.0	231.0	240.0	240.0	250	255
12	AFG	2	Afghanistan	2543	Sweeteners, Other	5142	Food	1000 tonnes	33.94	67.71	...	9.0	15.0	12.0	6.0	11.0	2.0	9.0	21.0	24	16
13	AFG	2	Afghanistan	2745	Honey	5142	Food	1000 tonnes	33.94	67.71	...	3.0	3.0	3.0	3.0	3.0	3.0	3.0	2.0	2	2
14	AFG	2	Afghanistan	2549	Pulses, Other and products	5521	Feed	1000 tonnes	33.94	67.71	...	3.0	2.0	3.0	3.0	3.0	5.0	4.0	5.0	4	4
15	AFG	2	Afghanistan	2549	Pulses, Other and products	5142	Food	1000 tonnes	33.94	67.71	...	17.0	35.0	37.0	40.0	54.0	80.0	66.0	81.0	63	74
16	AFG	2	Afghanistan	2551	Nuts and products	5142	Food	1000 tonnes	33.94	67.71	...	11.0	13.0	24.0	34.0	42.0	28.0	66.0	71.0	70	44
17	AFG	2	Afghanistan	2560	Coconuts - Incl Copra	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
18	AFG	2	Afghanistan	2561	Sesame seed	5142	Food	1000 tonnes	33.94	67.71	...	16.0	16.0	13.0	16.0	16.0	16.0	19.0	17.0	16	16
19	AFG	2	Afghanistan	2563	Olives (including preserved)	5142	Food	1000 tonnes	33.94	67.71	...	1.0	1.0	0.0	0.0	2.0	3.0	2.0	2.0	2	2
20	AFG	2	Afghanistan	2571	Soyabean Oil	5142	Food	1000 tonnes	33.94	67.71	...	6.0	35.0	18.0	21.0	11.0	6.0	15.0	16.0	16	16
21	AFG	2	Afghanistan	2572	Groundnut Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
22	AFG	2	Afghanistan	2573	Sunflowerseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	4.0	6.0	5.0	9.0	3.0	8.0	15.0	16.0	17	23
23	AFG	2	Afghanistan	2574	Rape and Mustard Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	1.0	3.0	5.0	6.0	6.0	1.0	2.0	2	2
24	AFG	2	Afghanistan	2575	Cottonseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	2.0	3.0	3.0	3.0	3.0	4.0	3.0	3.0	3	4
25	AFG	2	Afghanistan	2577	Palm Oil	5142	Food	1000 tonnes	33.94	67.71	...	71.0	69.0	56.0	51.0	36.0	53.0	59.0	51.0	61	64
26	AFG	2	Afghanistan	2579	Sesameseed Oil	5142	Food	1000 tonnes	33.94	67.71	...	1.0	1.0	1.0	2.0	2.0	1.0	1.0	2.0	1	1
27	AFG	2	Afghanistan	2580	Olive Oil	5142	Food	1000 tonnes	33.94	67.71	...	0.0	0.0	0.0	0.0	0.0	1.0	1.0	1.0	1	1
28	AFG	2	Afghanistan	2586	Oilcrops Oil, Other	5142	Food	1000 tonnes	33.94	67.71	...	0.0	1.0	0.0	0.0	3.0	1.0	2.0	2.0	2	2
29	AFG	2	Afghanistan	2601	Tomatoes and products	5142	Food	1000 tonnes	33.94	67.71	...	2.0	2.0	8.0	1.0	0.0	0.0	0.0	0.0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
21447	ZWE	181	Zimbabwe	2765	Crustaceans	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21448	ZWE	181	Zimbabwe	2766	Cephalopods	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21449	ZWE	181	Zimbabwe	2767	Molluscs, Other	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0	0
21450	ZWE	181	Zimbabwe	2775	Aquatic Plants	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21451	ZWE	181	Zimbabwe	2680	Infant food	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21452	ZWE	181	Zimbabwe	2905	Cereals - Excluding Beer	5521	Feed	1000 tonnes	-19.02	29.15	...	75.0	54.0	75.0	55.0	63.0	62.0	55.0	55.0	55	55
21453	ZWE	181	Zimbabwe	2905	Cereals - Excluding Beer	5142	Food	1000 tonnes	-19.02	29.15	...	1844.0	1842.0	1944.0	1962.0	1918.0	1980.0	2011.0	2094.0	2071	2016
21454	ZWE	181	Zimbabwe	2907	Starchy Roots	5142	Food	1000 tonnes	-19.02	29.15	...	223.0	236.0	238.0	228.0	245.0	258.0	258.0	269.0	272	276
21455	ZWE	181	Zimbabwe	2908	Sugar Crops	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21456	ZWE	181	Zimbabwe	2909	Sugar & Sweeteners	5142	Food	1000 tonnes	-19.02	29.15	...	335.0	313.0	339.0	302.0	285.0	287.0	314.0	336.0	396	416
21457	ZWE	181	Zimbabwe	2911	Pulses	5142	Food	1000 tonnes	-19.02	29.15	...	63.0	59.0	61.0	57.0	69.0	78.0	68.0	56.0	52	55
21458	ZWE	181	Zimbabwe	2912	Treenuts	5142	Food	1000 tonnes	-19.02	29.15	...	1.0	2.0	1.0	2.0	2.0	3.0	4.0	2.0	4	3
21459	ZWE	181	Zimbabwe	2913	Oilcrops	5521	Feed	1000 tonnes	-19.02	29.15	...	36.0	46.0	41.0	33.0	31.0	19.0	24.0	17.0	27	30
21460	ZWE	181	Zimbabwe	2913	Oilcrops	5142	Food	1000 tonnes	-19.02	29.15	...	60.0	59.0	61.0	62.0	48.0	44.0	41.0	40.0	38	38
21461	ZWE	181	Zimbabwe	2914	Vegetable Oils	5142	Food	1000 tonnes	-19.02	29.15	...	111.0	114.0	112.0	114.0	134.0	135.0	137.0	147.0	159	160
21462	ZWE	181	Zimbabwe	2918	Vegetables	5142	Food	1000 tonnes	-19.02	29.15	...	161.0	166.0	208.0	185.0	137.0	179.0	215.0	217.0	227	227
21463	ZWE	181	Zimbabwe	2919	Fruits - Excluding Wine	5142	Food	1000 tonnes	-19.02	29.15	...	191.0	134.0	167.0	177.0	185.0	184.0	211.0	230.0	246	217
21464	ZWE	181	Zimbabwe	2922	Stimulants	5142	Food	1000 tonnes	-19.02	29.15	...	7.0	21.0	14.0	24.0	16.0	11.0	23.0	11.0	10	10
21465	ZWE	181	Zimbabwe	2923	Spices	5142	Food	1000 tonnes	-19.02	29.15	...	7.0	11.0	7.0	12.0	16.0	16.0	14.0	11.0	12	12
21466	ZWE	181	Zimbabwe	2924	Alcoholic Beverages	5142	Food	1000 tonnes	-19.02	29.15	...	294.0	290.0	316.0	355.0	398.0	437.0	448.0	476.0	525	516
21467	ZWE	181	Zimbabwe	2943	Meat	5142	Food	1000 tonnes	-19.02	29.15	...	222.0	228.0	233.0	238.0	242.0	265.0	262.0	277.0	280	258
21468	ZWE	181	Zimbabwe	2945	Offals	5142	Food	1000 tonnes	-19.02	29.15	...	20.0	20.0	21.0	21.0	21.0	21.0	21.0	21.0	22	22
21469	ZWE	181	Zimbabwe	2946	Animal fats	5142	Food	1000 tonnes	-19.02	29.15	...	26.0	26.0	29.0	29.0	27.0	31.0	30.0	25.0	26	20
21470	ZWE	181	Zimbabwe	2949	Eggs	5142	Food	1000 tonnes	-19.02	29.15	...	15.0	18.0	18.0	21.0	22.0	27.0	27.0	24.0	24	25
21471	ZWE	181	Zimbabwe	2948	Milk - Excluding Butter	5521	Feed	1000 tonnes	-19.02	29.15	...	21.0	21.0	21.0	21.0	21.0	23.0	25.0	25.0	30	31
21472	ZWE	181	Zimbabwe	2948	Milk - Excluding Butter	5142	Food	1000 tonnes	-19.02	29.15	...	373.0	357.0	359.0	356.0	341.0	385.0	418.0	457.0	426	451
21473	ZWE	181	Zimbabwe	2960	Fish, Seafood	5521	Feed	1000 tonnes	-19.02	29.15	...	5.0	4.0	9.0	6.0	9.0	5.0	15.0	15.0	15	15
21474	ZWE	181	Zimbabwe	2960	Fish, Seafood	5142	Food	1000 tonnes	-19.02	29.15	...	18.0	14.0	17.0	14.0	15.0	18.0	29.0	40.0	40	40
21475	ZWE	181	Zimbabwe	2961	Aquatic Products, Other	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0
21476	ZWE	181	Zimbabwe	2928	Miscellaneous	5142	Food	1000 tonnes	-19.02	29.15	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0	0

21477 rows × 63 columns

area_list = list(df['Area'].unique())
year_list = list(df.iloc[:,10:].columns)

plt.figure(figsize=(24,12))
for ar in area_list:
    yearly_produce = []
    for yr in year_list:
        yearly_produce.append(df[yr][df['Area'] == ar].sum())
    plt.plot(yearly_produce, label=ar)
plt.xticks(np.arange(53), tuple(year_list), rotation=60)
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3, ncol=8, mode="expand", borderaxespad=0.)
plt.savefig('p.png')
plt.show()

plt.figure(figsize=(24,12))

<Figure size 1728x864 with 0 Axes>

<Figure size 1728x864 with 0 Axes>

Clearly, China, India and US stand out here. So, these are the countries with most food and feed production.

Now, let's have a close look at their food and feed data

Food and feed plot for the whole dataset

sns.factorplot("Element", data=df, kind="count")
plt.show()

/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)

So, there is a huge difference in food and feed production. Now, we have obvious assumptions about the following plots after looking at this huge difference.

Food and feed plot for the largest producers(India, USA, China)

sns.factorplot("Area", data=df[(df['Area'] == "India") | (df['Area'] == "China, mainland") | (df['Area'] == "United States of America")], kind="count", hue="Element", size=8, aspect=.8)

/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid at 0x1a218d2550>

Though, there is a huge difference between feed and food production, these countries' total production and their ranks depend on feed production.

Now, we create a dataframe with countries as index and their annual produce as columns from 1961 to 2013.

new_df_dict = {}
for ar in area_list:
    yearly_produce = []
    for yr in year_list:
        yearly_produce.append(df[yr][df['Area']==ar].sum())
    new_df_dict[ar] = yearly_produce
new_df = pd.DataFrame(new_df_dict)

new_df.head()

	Afghanistan	Albania	Algeria	Angola	Antigua and Barbuda	Argentina	Australia	Austria	...	United Republic of Tanzania	United States of America	Uruguay	Vanuatu	Venezuela (Bolivarian Republic of)	Viet Nam	Yemen	Zambia	Zimbabwe
0	9481.0	1706.0	7488.0	4834.0	92.0	43402.0	25795.0	22542.0	...	12367.0	559347.0	4631.0	97.0	9523.0	23856.0	2982.0	2976.0	3260.0
1	9414.0	1749.0	7235.0	4775.0	94.0	40784.0	27618.0	22627.0	...	12810.0	556319.0	4448.0	101.0	9369.0	25220.0	3038.0	3057.0	3503.0
2	9194.0	1767.0	6861.0	5240.0	105.0	40219.0	28902.0	23637.0	...	13109.0	552630.0	4682.0	103.0	9788.0	26053.0	3147.0	3069.0	3479.0
3	10170.0	1889.0	7255.0	5286.0	95.0	41638.0	29107.0	24099.0	...	12965.0	555677.0	4723.0	102.0	10539.0	26377.0	3224.0	3121.0	3738.0
4	10473.0	1884.0	7509.0	5527.0	84.0	44936.0	28961.0	22664.0	...	13742.0	589288.0	4581.0	107.0	10641.0	26961.0	3328.0	3236.0	3940.0

5 rows × 174 columns

Now, this is not perfect so we transpose this dataframe and add column names.

new_df = pd.DataFrame.transpose(new_df)
new_df.columns = year_list

new_df.head()

	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966	Y1967	Y1968	Y1969	Y1970	...	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013
Afghanistan	9481.0	9414.0	9194.0	10170.0	10473.0	10169.0	11289.0	11508.0	11815.0	10454.0	...	16542.0	17658.0	18317.0	19248.0	19381.0	20661.0	21030.0	21100.0	22706.0	23007.0
Albania	1706.0	1749.0	1767.0	1889.0	1884.0	1995.0	2046.0	2169.0	2230.0	2395.0	...	6637.0	6719.0	6911.0	6744.0	7168.0	7316.0	7907.0	8114.0	8221.0	8271.0
Algeria	7488.0	7235.0	6861.0	7255.0	7509.0	7536.0	7986.0	8839.0	9003.0	9355.0	...	48619.0	49562.0	51067.0	49933.0	50916.0	57505.0	60071.0	65852.0	69365.0	72161.0
Angola	4834.0	4775.0	5240.0	5286.0	5527.0	5677.0	5833.0	5685.0	6219.0	6460.0	...	25541.0	26696.0	28247.0	29877.0	32053.0	36985.0	38400.0	40573.0	38064.0	48639.0
Antigua and Barbuda	92.0	94.0	105.0	95.0	84.0	73.0	64.0	59.0	68.0	77.0	...	92.0	115.0	110.0	122.0	115.0	114.0	115.0	118.0	113.0	119.0

5 rows × 53 columns

Perfect! Now, we will do some feature engineering.

First, a new column which indicates mean produce of each state over the given years. Second, a ranking column which ranks countries on the basis of mean produce.

mean_produce = []
for i in range(174):
    mean_produce.append(new_df.iloc[i,:].values.mean())
new_df['Mean_Produce'] = mean_produce

new_df['Rank'] = new_df['Mean_Produce'].rank(ascending=False)

new_df.head()

	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966	Y1967	Y1968	Y1969	Y1970	...	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013	Mean_Produce	Rank
Afghanistan	9481.0	9414.0	9194.0	10170.0	10473.0	10169.0	11289.0	11508.0	11815.0	10454.0	...	18317.0	19248.0	19381.0	20661.0	21030.0	21100.0	22706.0	23007.0	13003.056604	69.0
Albania	1706.0	1749.0	1767.0	1889.0	1884.0	1995.0	2046.0	2169.0	2230.0	2395.0	...	6911.0	6744.0	7168.0	7316.0	7907.0	8114.0	8221.0	8271.0	4475.509434	104.0
Algeria	7488.0	7235.0	6861.0	7255.0	7509.0	7536.0	7986.0	8839.0	9003.0	9355.0	...	51067.0	49933.0	50916.0	57505.0	60071.0	65852.0	69365.0	72161.0	28879.490566	38.0
Angola	4834.0	4775.0	5240.0	5286.0	5527.0	5677.0	5833.0	5685.0	6219.0	6460.0	...	28247.0	29877.0	32053.0	36985.0	38400.0	40573.0	38064.0	48639.0	13321.056604	68.0
Antigua and Barbuda	92.0	94.0	105.0	95.0	84.0	73.0	64.0	59.0	68.0	77.0	...	110.0	122.0	115.0	114.0	115.0	118.0	113.0	119.0	83.886792	172.0

5 rows × 55 columns

Now, we create another dataframe with items and their total production each year from 1961 to 2013

item_list = list(df['Item'].unique())

item_df = pd.DataFrame()
item_df['Item_Name'] = item_list

for yr in year_list:
    item_produce = []
    for it in item_list:
        item_produce.append(df[yr][df['Item']==it].sum())
    item_df[yr] = item_produce

item_df.head()

	Item_Name	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966	Y1967	Y1968	Y1969	...	Y2004	Y2005	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013
0	Wheat and products	138829.0	144643.0	147325.0	156273.0	168822.0	169832.0	171469.0	179530.0	189658.0	...	527394.0	532263.0	537279.0	529271.0	562239.0	557245.0	549926.0	578179.0	576597	587492
1	Rice (Milled Equivalent)	122700.0	131842.0	139507.0	148304.0	150056.0	155583.0	158587.0	164614.0	167922.0	...	361107.0	366025.0	372629.0	378698.0	389708.0	394221.0	398559.0	404152.0	406787	410880
2	Barley and products	46180.0	48915.0	51642.0	54184.0	54945.0	55463.0	56424.0	60455.0	65501.0	...	102055.0	97185.0	100981.0	93310.0	98209.0	99135.0	92563.0	92570.0	88766	99452
3	Maize and products	168039.0	168305.0	172905.0	175468.0	190304.0	200860.0	213050.0	215613.0	221953.0	...	545024.0	549036.0	543280.0	573892.0	592231.0	557940.0	584337.0	603297.0	608730	671300
4	Millet and products	19075.0	19019.0	19740.0	20353.0	18377.0	20860.0	22997.0	21785.0	23966.0	...	25789.0	25496.0	25997.0	26750.0	26373.0	24575.0	27039.0	25740.0	26105	26346

5 rows × 54 columns

Some more feature engineering

This time, we will use the new features to get some good conclusions.

1. Total amount of item produced from 1961 to 2013

2. Providing a rank to the items to know the most produced item

sum_col = []
for i in range(115):
    sum_col.append(item_df.iloc[i,1:].values.sum())
item_df['Sum'] = sum_col
item_df['Production_Rank'] = item_df['Sum'].rank(ascending=False)

item_df.head()

	Item_Name	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966	Y1967	Y1968	Y1969	...	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013	Sum	Production_Rank
0	Wheat and products	138829.0	144643.0	147325.0	156273.0	168822.0	169832.0	171469.0	179530.0	189658.0	...	537279.0	529271.0	562239.0	557245.0	549926.0	578179.0	576597	587492	19194671.0	6.0
1	Rice (Milled Equivalent)	122700.0	131842.0	139507.0	148304.0	150056.0	155583.0	158587.0	164614.0	167922.0	...	372629.0	378698.0	389708.0	394221.0	398559.0	404152.0	406787	410880	14475448.0	8.0
2	Barley and products	46180.0	48915.0	51642.0	54184.0	54945.0	55463.0	56424.0	60455.0	65501.0	...	100981.0	93310.0	98209.0	99135.0	92563.0	92570.0	88766	99452	4442742.0	20.0
3	Maize and products	168039.0	168305.0	172905.0	175468.0	190304.0	200860.0	213050.0	215613.0	221953.0	...	543280.0	573892.0	592231.0	557940.0	584337.0	603297.0	608730	671300	19960640.0	5.0
4	Millet and products	19075.0	19019.0	19740.0	20353.0	18377.0	20860.0	22997.0	21785.0	23966.0	...	25997.0	26750.0	26373.0	24575.0	27039.0	25740.0	26105	26346	1225400.0	38.0

5 rows × 56 columns

Now, we find the most produced food items in the last half-century

item_df['Item_Name'][item_df['Production_Rank'] &lt; 11.0].sort_values()

56    Cereals - Excluding Beer
65     Fruits - Excluding Wine
3           Maize and products
53     Milk - Excluding Butter
6        Potatoes and products
1     Rice (Milled Equivalent)
57               Starchy Roots
64                  Vegetables
27           Vegetables, Other
0           Wheat and products
Name: Item_Name, dtype: object

So, cereals, fruits and maize are the most produced items in the last 50 years

Food and feed plot for most produced items

sns.factorplot("Item", data=df[(df['Item']=='Wheat and products') | (df['Item']=='Rice (Milled Equivalent)') | (df['Item']=='Maize and products') | (df['Item']=='Potatoes and products') | (df['Item']=='Vegetables, Other') | (df['Item']=='Milk - Excluding Butter') | (df['Item']=='Cereals - Excluding Beer') | (df['Item']=='Starchy Roots') | (df['Item']=='Vegetables') | (df['Item']=='Fruits - Excluding Wine')], kind="count", hue="Element", size=20, aspect=.8)
plt.show()

/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
/anaconda3/lib/python3.7/site-packages/seaborn/categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
  warnings.warn(msg, UserWarning)

Now, we plot a heatmap of correlation of produce in difference years

year_df = df.iloc[:,10:]
fig, ax = plt.subplots(figsize=(16,10))
sns.heatmap(year_df.corr(), ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x1a23b4b128>

So, we gather that a given year's production is more similar to its immediate previous and immediate following years.

f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10,10))
ax1.set(xlabel='Y1968', ylabel='Y1961')
ax2.set(xlabel='Y1968', ylabel='Y1963')
ax3.set(xlabel='Y1968', ylabel='Y1986')
ax4.set(xlabel='Y1968', ylabel='Y2013')
sns.jointplot(x="Y1968", y="Y1961", data=df, kind="reg", ax=ax1)
sns.jointplot(x="Y1968", y="Y1963", data=df, kind="reg", ax=ax2)
sns.jointplot(x="Y1968", y="Y1986", data=df, kind="reg", ax=ax3)
sns.jointplot(x="Y1968", y="Y2013", data=df, kind="reg", ax=ax4)
plt.close(2)
plt.close(3)
plt.close(4)
plt.close(5)

/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Heatmap of production of food items over years

This will detect the items whose production has drastically increased over the years

new_item_df = item_df.drop(["Item_Name","Sum","Production_Rank"], axis = 1)
fig, ax = plt.subplots(figsize=(12,24))
sns.heatmap(new_item_df,ax=ax)
ax.set_yticklabels(item_df.Item_Name.values[::-1])
plt.show()

There is considerable growth in production of Palmkernel oil, Meat/Aquatic animals, ricebran oil, cottonseed, seafood, offals, roots, poultry meat, mutton, bear, cocoa, coffee and soyabean oil. There has been exceptional growth in production of onions, cream, sugar crops, treenuts, butter/ghee and to some extent starchy roots.

Now, we look at clustering.

What is clustering?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Today, we will form clusters to classify countries based on productivity scale

For this, we will use k-means clustering algorithm.

K-means clustering

(Source Wikipedia )

This is the data we will use.

new_df.head()

	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966	Y1967	Y1968	Y1969	Y1970	...	Y2006	Y2007	Y2008	Y2009	Y2010	Y2011	Y2012	Y2013	Mean_Produce	Rank
Afghanistan	9481.0	9414.0	9194.0	10170.0	10473.0	10169.0	11289.0	11508.0	11815.0	10454.0	...	18317.0	19248.0	19381.0	20661.0	21030.0	21100.0	22706.0	23007.0	13003.056604	69.0
Albania	1706.0	1749.0	1767.0	1889.0	1884.0	1995.0	2046.0	2169.0	2230.0	2395.0	...	6911.0	6744.0	7168.0	7316.0	7907.0	8114.0	8221.0	8271.0	4475.509434	104.0
Algeria	7488.0	7235.0	6861.0	7255.0	7509.0	7536.0	7986.0	8839.0	9003.0	9355.0	...	51067.0	49933.0	50916.0	57505.0	60071.0	65852.0	69365.0	72161.0	28879.490566	38.0
Angola	4834.0	4775.0	5240.0	5286.0	5527.0	5677.0	5833.0	5685.0	6219.0	6460.0	...	28247.0	29877.0	32053.0	36985.0	38400.0	40573.0	38064.0	48639.0	13321.056604	68.0
Antigua and Barbuda	92.0	94.0	105.0	95.0	84.0	73.0	64.0	59.0	68.0	77.0	...	110.0	122.0	115.0	114.0	115.0	118.0	113.0	119.0	83.886792	172.0

5 rows × 55 columns

X = new_df.iloc[:,:-2].values

X = pd.DataFrame(X)
X = X.convert_objects(convert_numeric=True)
X.columns = year_list

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  after removing the cwd from sys.path.

Elbow method to select number of clusters

This method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data. More precisely, if one plots the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters is chosen at this point, hence the "elbow criterion". This "elbow" cannot always be unambiguously identified. Percentage of variance explained is the ratio of the between-group variance to the total variance, also known as an F-test. A slight variation of this method plots the curvature of the within group variance.

Basically, number of clusters = the x-axis value of the point that is the corner of the "elbow"(the plot looks often looks like an elbow)

from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
    kmeans = KMeans(n_clusters=i,init='k-means++',max_iter=300,n_init=10,random_state=0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
plt.plot(range(1,11),wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

As the elbow corner coincides with x=2, we will have to form 2 clusters. Personally, I would have liked to select 3 to 4 clusters. But trust me, only selecting 2 clusters can lead to best results. Now, we apply k-means algorithm.

kmeans = KMeans(n_clusters=2,init='k-means++',max_iter=300,n_init=10,random_state=0) 
y_kmeans = kmeans.fit_predict(X)

X = X.as_matrix(columns=None)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  after removing the cwd from sys.path.

Now, let's visualize the results.

plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0,1],s=100,c='red',label='Others')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1,1],s=100,c='blue',label='China(mainland),USA,India')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300,c='yellow',label='Centroids')
plt.title('Clusters of countries by Productivity')
plt.legend()
plt.show()

So, the blue cluster represents China(Mainland), USA and India while the red cluster represents all the other countries. This result was highly probable. Just take a look at the plot of cell 3 above. See how China, USA and India stand out. That has been observed here in clustering too.

You should try this algorithm for 3 or 4 clusters. Looking at the distribution, you will realise why 2 clusters is the best choice for the given data

This is not the end! More is yet to come.

Now, lets try to predict the production using regression for 2020. We will predict the production for USA,India and Pakistan.

india_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    x=df[(df.Area=='India') & (df.Element=='Food')][i].mean()
    india_list.append(x)    

reset=[]
for i in year_list:
    reset.append(int(i[1:]))


reset=np.array(reset)
reset=reset.reshape(-1,1)


india_list=np.array(india_list)
india_list=india_list.reshape(-1,1)


reg = LinearRegression()
reg.fit(reset,india_list)
predictions = reg.predict(reset)
plt.title("India")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,india_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))

df[(df.Area=='India') & (df.Element=='Food')]['Y1961'].mean()

df[(df.Area=='Pakistan') & (df.Element=='Food')]

Pak_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    yx=df[(df.Area=='Pakistan') & (df.Element=='Food')][i].mean()
    Pak_list.append(yx)   

Pak_list=np.array(Pak_list)
Pak_list=Pak_list.reshape(-1,1)
Pak_list
reg = LinearRegression()
reg.fit(reset,Pak_list)
predictions = reg.predict(reset)
plt.title("Pakistan")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,Pak_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))



usa_list=[]
year_list = list(df.iloc[:,10:].columns)
for i in year_list:
    xu=df[(df.Area=='United States of America') & (df.Element=='Food')][i].mean()
    usa_list.append(xu)

usa_list=np.array(usa_list)
usa_list=india_list.reshape(-1,1)


reg = LinearRegression()
reg.fit(reset,usa_list)
predictions = reg.predict(reset)
plt.title("USA")
plt.xlabel("Year")
plt.ylabel("Production")
plt.scatter(reset,usa_list)
plt.plot(reset,predictions)
plt.show()
print(reg.predict(2020))

[0;31m---------------------------------------------------------------------------[0m
[0;31mValueError[0m                                Traceback (most recent call last)
[0;32m&lt;ipython-input-24-da7cfa1c86d1&gt;[0m in [0;36m&lt;module&gt;[0;34m[0m
[1;32m     27[0m [0mplt[0m[0;34m.[0m[0mplot[0m[0;34m([0m[0mreset[0m[0;34m,[0m[0mpredictions[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m     28[0m [0mplt[0m[0;34m.[0m[0mshow[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m---&gt; 29[0;31m [0mprint[0m[0;34m([0m[0mreg[0m[0;34m.[0m[0mpredict[0m[0;34m([0m[0;36m2020[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m     30[0m [0;34m[0m[0m
[1;32m     31[0m [0mdf[0m[0;34m[[0m[0;34m([0m[0mdf[0m[0;34m.[0m[0mArea[0m[0;34m==[0m[0;34m'India'[0m[0;34m)[0m [0;34m&[0m [0;34m([0m[0mdf[0m[0;34m.[0m[0mElement[0m[0;34m==[0m[0;34m'Food'[0m[0;34m)[0m[0;34m][0m[0;34m[[0m[0;34m'Y1961'[0m[0;34m][0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m

[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py[0m in [0;36mpredict[0;34m(self, X)[0m
[1;32m    211[0m             [0mReturns[0m [0mpredicted[0m [0mvalues[0m[0;34m.[0m[0;34m[0m[0;34m[0m[0m
[1;32m    212[0m         """
[0;32m--&gt; 213[0;31m         [0;32mreturn[0m [0mself[0m[0;34m.[0m[0m_decision_function[0m[0;34m([0m[0mX[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m    214[0m [0;34m[0m[0m
[1;32m    215[0m     [0m_preprocess_data[0m [0;34m=[0m [0mstaticmethod[0m[0;34m([0m[0m_preprocess_data[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m

[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/base.py[0m in [0;36m_decision_function[0;34m(self, X)[0m
[1;32m    194[0m         [0mcheck_is_fitted[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m"coef_"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m    195[0m [0;34m[0m[0m
[0;32m--&gt; 196[0;31m         [0mX[0m [0;34m=[0m [0mcheck_array[0m[0;34m([0m[0mX[0m[0;34m,[0m [0maccept_sparse[0m[0;34m=[0m[0;34m[[0m[0;34m'csr'[0m[0;34m,[0m [0;34m'csc'[0m[0;34m,[0m [0;34m'coo'[0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m    197[0m         return safe_sparse_dot(X, self.coef_.T,
[1;32m    198[0m                                dense_output=True) + self.intercept_

[0;32m/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py[0m in [0;36mcheck_array[0;34m(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)[0m
[1;32m    543[0m                     [0;34m"Reshape your data either using array.reshape(-1, 1) if "[0m[0;34m[0m[0;34m[0m[0m
[1;32m    544[0m                     [0;34m"your data has a single feature or array.reshape(1, -1) "[0m[0;34m[0m[0;34m[0m[0m
[0;32m--&gt; 545[0;31m                     "if it contains a single sample.".format(array))
[0m[1;32m    546[0m             [0;31m# If input is 1D raise error[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m    547[0m             [0;32mif[0m [0marray[0m[0;34m.[0m[0mndim[0m [0;34m==[0m [0;36m1[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m

[0;31mValueError[0m: Expected 2D array, got scalar array instead:
array=2020.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

The Algorithms

Food Wastage Analysis From 1961-2013 Fao