# The usual preamble

%matplotlib inline

import pandas as pd

import matplotlib.pyplot as plt



# Make the graphs a bit prettier, and bigger

plt.style.use('ggplot')



# This is necessary to show lots of columns in pandas 0.12. 

# Not necessary in pandas 0.13.

pd.set_option('display.width', 5000) 

pd.set_option('display.max_columns', 60)



plt.rcParams['figure.figsize'] = (15, 5)

We're going to use a new dataset here, to demonstrate how to deal with larger datasets. This is a subset of the of 311 service requests from NYC Open Data.

data_url = "https://sciencedata.dk/public/6e3ed434c0fa43df906ce2b6d1ba9fc6/pandas-cookbook/data/311-service-requests.csv"

# because of mixed types we specify dtype to prevent any errors

complaints = pd.read_csv(data_url, dtype='unicode')

Depending on your pandas version, you might see an error like "DtypeWarning: Columns (8) have mixed types". This means that it's encountered a problem reading in our data. In this case it almost certainly means that it has columns where some of the entries are strings and some are integers.

For now we're going to ignore it and hope we don't run into a problem, but in the long run we'd need to investigate this warning.

2.1 What's even in it? (the summary)¶

When you print a large dataframe, it will only show you the first few rows.

If you don't see this, don't panic! The default behavior for large dataframes changed between pandas 0.12 and 0.13. Previous to 0.13 it would show you a summary of the dataframe. This includes all the columns, and how many non-null values there are in each column.

complaints

2.2 Selecting columns and rows¶

To select a column, we index with the name of the column, like this:

complaints['Complaint Type']

0         Noise - Street/Sidewalk
1                 Illegal Parking
2              Noise - Commercial
3                 Noise - Vehicle
4                          Rodent
                   ...           
111064    Maintenance or Facility
111065            Illegal Parking
111066    Noise - Street/Sidewalk
111067         Noise - Commercial
111068           Blocked Driveway
Name: Complaint Type, Length: 111069, dtype: object

To get the first 5 rows of a dataframe, we can use a slice: df[:5].

This is a great way to get a sense for what kind of information is in the dataframe -- take a minute to look at the contents and get a feel for this dataset.

complaints[:5]

We can combine these to get the first 5 rows of a column:

complaints['Complaint Type'][:5]

0    Noise - Street/Sidewalk
1            Illegal Parking
2         Noise - Commercial
3            Noise - Vehicle
4                     Rodent
Name: Complaint Type, dtype: object

and it doesn't matter which direction we do it in:

complaints[:5]['Complaint Type']

0    Noise - Street/Sidewalk
1            Illegal Parking
2         Noise - Commercial
3            Noise - Vehicle
4                     Rodent
Name: Complaint Type, dtype: object

2.3 Selecting multiple columns¶

What if we just want to know the complaint type and the borough, but not the rest of the information? Pandas makes it really easy to select a subset of the columns: just index with list of columns you want.

complaints[['Complaint Type', 'Borough']]

That showed us a summary, and then we can look at the first 10 rows:

complaints[['Complaint Type', 'Borough']][:10]

2.4 What's the most common complaint type?¶

This is a really easy question to answer! There's a .value_counts() method that we can use:

complaints['Complaint Type'].value_counts()

HEATING                       14200
GENERAL CONSTRUCTION           7471
Street Light Condition         7117
DOF Literature Request         5797
PLUMBING                       5373
                              ...  
Trans Fat                         1
Highway Sign - Damaged            1
Municipal Parking Facility        1
DWD                               1
Tunnel Condition                  1
Name: Complaint Type, Length: 165, dtype: int64

If we just wanted the top 10 most common complaints, we can do this:

complaint_counts = complaints['Complaint Type'].value_counts()

complaint_counts[:10]

HEATING                   14200
GENERAL CONSTRUCTION       7471
Street Light Condition     7117
DOF Literature Request     5797
PLUMBING                   5373
PAINT - PLASTER            5149
Blocked Driveway           4590
NONCONST                   3998
Street Condition           3473
Illegal Parking            3343
Name: Complaint Type, dtype: int64

But it gets better! We can plot them!

complaint_counts[:10].plot(kind='bar')

<AxesSubplot:>

	Unique Key	Created Date	Closed Date	Agency	Agency Name	Complaint Type	Descriptor	Location Type	Incident Zip	Incident Address	Street Name	Cross Street 1	Cross Street 2	Intersection Street 1	Intersection Street 2	Address Type	City	Landmark	Facility Type	Status	Due Date	Resolution Action Updated Date	Community Board	Borough	X Coordinate (State Plane)	Y Coordinate (State Plane)	Park Facility Name	Park Borough	School Name	School Number	School Region	School Code	School Phone Number	School Address	School City	School State	School Zip	School Not Found	School or Citywide Complaint	Vehicle Type	Taxi Company Borough	Taxi Pick Up Location	Bridge Highway Name	Bridge Highway Direction	Road Ramp	Bridge Highway Segment	Garage Lot Name	Ferry Direction	Ferry Terminal Name	Latitude	Longitude	Location
0	26589651	10/31/2013 02:08:41 AM	NaN	NYPD	New York City Police Department	Noise - Street/Sidewalk	Loud Talking	Street/Sidewalk	11432	90-03 169 STREET	169 STREET	90 AVENUE	91 AVENUE	NaN	NaN	ADDRESS	JAMAICA	NaN	Precinct	Assigned	10/31/2013 10:08:41 AM	10/31/2013 02:35:17 AM	12 QUEENS	QUEENS	1042027	197389	Unspecified	QUEENS	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.70827532593202	-73.79160395779721	(40.70827532593202, -73.79160395779721)
1	26593698	10/31/2013 02:01:04 AM	NaN	NYPD	New York City Police Department	Illegal Parking	Commercial Overnight Parking	Street/Sidewalk	11378	58 AVENUE	58 AVENUE	58 PLACE	59 STREET	NaN	NaN	BLOCKFACE	MASPETH	NaN	Precinct	Open	10/31/2013 10:01:04 AM	NaN	05 QUEENS	QUEENS	1009349	201984	Unspecified	QUEENS	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.721040535628305	-73.90945306791765	(40.721040535628305, -73.90945306791765)
2	26594139	10/31/2013 02:00:24 AM	10/31/2013 02:40:32 AM	NYPD	New York City Police Department	Noise - Commercial	Loud Music/Party	Club/Bar/Restaurant	10032	4060 BROADWAY	BROADWAY	WEST 171 STREET	WEST 172 STREET	NaN	NaN	ADDRESS	NEW YORK	NaN	Precinct	Closed	10/31/2013 10:00:24 AM	10/31/2013 02:39:42 AM	12 MANHATTAN	MANHATTAN	1001088	246531	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.84332975466513	-73.93914371913482	(40.84332975466513, -73.93914371913482)
3	26595721	10/31/2013 01:56:23 AM	10/31/2013 02:21:48 AM	NYPD	New York City Police Department	Noise - Vehicle	Car/Truck Horn	Street/Sidewalk	10023	WEST 72 STREET	WEST 72 STREET	COLUMBUS AVENUE	AMSTERDAM AVENUE	NaN	NaN	BLOCKFACE	NEW YORK	NaN	Precinct	Closed	10/31/2013 09:56:23 AM	10/31/2013 02:21:10 AM	07 MANHATTAN	MANHATTAN	989730	222727	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.7780087446372	-73.98021349023975	(40.7780087446372, -73.98021349023975)
4	26590930	10/31/2013 01:53:44 AM	NaN	DOHMH	Department of Health and Mental Hygiene	Rodent	Condition Attracting Rodents	Vacant Lot	10027	WEST 124 STREET	WEST 124 STREET	LENOX AVENUE	ADAM CLAYTON POWELL JR BOULEVARD	NaN	NaN	BLOCKFACE	NEW YORK	NaN	NaN	Pending	11/30/2013 01:53:44 AM	10/31/2013 01:59:54 AM	10 MANHATTAN	MANHATTAN	998815	233545	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.80769092704951	-73.94738703491433	(40.80769092704951, -73.94738703491433)
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
111064	26426013	10/04/2013 12:01:13 AM	10/07/2013 04:07:16 PM	DPR	Department of Parks and Recreation	Maintenance or Facility	Structure - Outdoors	Park	11213	NaN	NaN	NaN	NaN	NaN	NaN	NaN	BROOKLYN	NaN	NaN	Closed	10/18/2013 12:01:13 AM	10/07/2013 04:07:16 PM	08 BROOKLYN	BROOKLYN	NaN	NaN	Brower Park	BROOKLYN	Brower Park	B012	NaN	NaN	7189658900	Brooklyn, St. Marks, Kingston Avenues, Park Place	BROOKLYN	NY	11213	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
111065	26428083	10/04/2013 12:01:05 AM	10/04/2013 02:13:50 AM	NYPD	New York City Police Department	Illegal Parking	Posted Parking Sign Violation	Street/Sidewalk	11434	NaN	NaN	NaN	NaN	GUY R BREWER BOULEVARD	ROCKAWAY BOULEVARD	INTERSECTION	JAMAICA	NaN	Precinct	Closed	10/04/2013 08:01:05 AM	10/04/2013 02:13:50 AM	13 QUEENS	QUEENS	1048801	178419	Unspecified	QUEENS	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.656160351546845	-73.76735262738222	(40.656160351546845, -73.76735262738222)
111066	26428987	10/04/2013 12:00:45 AM	10/04/2013 01:25:01 AM	NYPD	New York City Police Department	Noise - Street/Sidewalk	Loud Talking	Street/Sidewalk	10016	344 EAST 28 STREET	EAST 28 STREET	MOUNT CARMEL PLACE	1 AVENUE	NaN	NaN	ADDRESS	NEW YORK	NaN	Precinct	Closed	10/04/2013 08:00:45 AM	10/04/2013 01:25:01 AM	06 MANHATTAN	MANHATTAN	990637	208987	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.740295354643706	-73.97695165980414	(40.740295354643706, -73.97695165980414)
111067	26426115	10/04/2013 12:00:28 AM	10/04/2013 04:17:32 AM	NYPD	New York City Police Department	Noise - Commercial	Loud Talking	Club/Bar/Restaurant	11226	1233 FLATBUSH AVENUE	FLATBUSH AVENUE	AVENUE D	NEWKIRK AVENUE	NaN	NaN	ADDRESS	BROOKLYN	NaN	Precinct	Closed	10/04/2013 08:00:28 AM	10/04/2013 04:17:32 AM	14 BROOKLYN	BROOKLYN	996654	172515	Unspecified	BROOKLYN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.64018174662485	-73.95530566958138	(40.64018174662485, -73.95530566958138)
111068	26428033	10/04/2013 12:00:10 AM	10/04/2013 01:20:52 AM	NYPD	New York City Police Department	Blocked Driveway	Partial Access	Street/Sidewalk	11236	1259 EAST 94 STREET	EAST 94 STREET	AVENUE J	AVENUE K	NaN	NaN	ADDRESS	BROOKLYN	NaN	Precinct	Closed	10/04/2013 08:00:10 AM	10/04/2013 01:20:52 AM	18 BROOKLYN	BROOKLYN	1011804	172470	Unspecified	BROOKLYN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.640024057399216	-73.90071711703163	(40.640024057399216, -73.90071711703163)

	Unique Key	Created Date	Closed Date	Agency	Agency Name	Complaint Type	Descriptor	Location Type	Incident Zip	Incident Address	Street Name	Cross Street 1	Cross Street 2	Intersection Street 1	Intersection Street 2	Address Type	City	Landmark	Facility Type	Status	Due Date	Resolution Action Updated Date	Community Board	Borough	X Coordinate (State Plane)	Y Coordinate (State Plane)	Park Facility Name	Park Borough	School Name	School Number	School Region	School Code	School Phone Number	School Address	School City	School State	School Zip	School Not Found	School or Citywide Complaint	Vehicle Type	Taxi Company Borough	Taxi Pick Up Location	Bridge Highway Name	Bridge Highway Direction	Road Ramp	Bridge Highway Segment	Garage Lot Name	Ferry Direction	Ferry Terminal Name	Latitude	Longitude	Location
0	26589651	10/31/2013 02:08:41 AM	NaN	NYPD	New York City Police Department	Noise - Street/Sidewalk	Loud Talking	Street/Sidewalk	11432	90-03 169 STREET	169 STREET	90 AVENUE	91 AVENUE	NaN	NaN	ADDRESS	JAMAICA	NaN	Precinct	Assigned	10/31/2013 10:08:41 AM	10/31/2013 02:35:17 AM	12 QUEENS	QUEENS	1042027	197389	Unspecified	QUEENS	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.70827532593202	-73.79160395779721	(40.70827532593202, -73.79160395779721)
1	26593698	10/31/2013 02:01:04 AM	NaN	NYPD	New York City Police Department	Illegal Parking	Commercial Overnight Parking	Street/Sidewalk	11378	58 AVENUE	58 AVENUE	58 PLACE	59 STREET	NaN	NaN	BLOCKFACE	MASPETH	NaN	Precinct	Open	10/31/2013 10:01:04 AM	NaN	05 QUEENS	QUEENS	1009349	201984	Unspecified	QUEENS	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.721040535628305	-73.90945306791765	(40.721040535628305, -73.90945306791765)
2	26594139	10/31/2013 02:00:24 AM	10/31/2013 02:40:32 AM	NYPD	New York City Police Department	Noise - Commercial	Loud Music/Party	Club/Bar/Restaurant	10032	4060 BROADWAY	BROADWAY	WEST 171 STREET	WEST 172 STREET	NaN	NaN	ADDRESS	NEW YORK	NaN	Precinct	Closed	10/31/2013 10:00:24 AM	10/31/2013 02:39:42 AM	12 MANHATTAN	MANHATTAN	1001088	246531	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.84332975466513	-73.93914371913482	(40.84332975466513, -73.93914371913482)
3	26595721	10/31/2013 01:56:23 AM	10/31/2013 02:21:48 AM	NYPD	New York City Police Department	Noise - Vehicle	Car/Truck Horn	Street/Sidewalk	10023	WEST 72 STREET	WEST 72 STREET	COLUMBUS AVENUE	AMSTERDAM AVENUE	NaN	NaN	BLOCKFACE	NEW YORK	NaN	Precinct	Closed	10/31/2013 09:56:23 AM	10/31/2013 02:21:10 AM	07 MANHATTAN	MANHATTAN	989730	222727	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.7780087446372	-73.98021349023975	(40.7780087446372, -73.98021349023975)
4	26590930	10/31/2013 01:53:44 AM	NaN	DOHMH	Department of Health and Mental Hygiene	Rodent	Condition Attracting Rodents	Vacant Lot	10027	WEST 124 STREET	WEST 124 STREET	LENOX AVENUE	ADAM CLAYTON POWELL JR BOULEVARD	NaN	NaN	BLOCKFACE	NEW YORK	NaN	NaN	Pending	11/30/2013 01:53:44 AM	10/31/2013 01:59:54 AM	10 MANHATTAN	MANHATTAN	998815	233545	Unspecified	MANHATTAN	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	Unspecified	N	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	40.80769092704951	-73.94738703491433	(40.80769092704951, -73.94738703491433)