Intro to the Socrata API with the NYC Dog Licensing Dataset & Python¶

# importing libraries
import pandas as pd
import numpy as np
from sodapy import Socrata
import requests
import plotly.express as px
from urllib.request import urlopen
import json
pd.options.display.max_rows = 100
pd.options.display.max_columns = 50

What is an API?¶

The Socrata API follows the REST (REpresentational State Transfer) design pattern.

REST stands for REpresentational State Transfer. It originally had a more abstract meaning, but has come to be a shorthand name for web sites that act a bit like python functions, taking as inputs values for certain parameters and producing outputs in the form of a long text string.

API stands for Application Programming Interface. An API specifies how an external program (an application program) can request that a program perform certain computations.

Putting the two together, a REST API specifies how external programs can make HTTP requests to a web site in order to request that some computation be carried out and data returned as output. When a website is designed to accept requests generated by other computer programs, and produce outputs to be consumed by other programs, it is sometimes called a web service, as opposed to a web site which produces output meant for humans to consume in a web browser.

Anatomy of a URL with the Socrata API¶

For this demonstration we will be making requests to the Socrata API with the The NYC Dog License Dataset.

Components of the url:

headers (required sometimes, but not always)
endpoints
parameters
API key

In a REST API, the client or application program makes an HTTP request that includes information about what kind of request it is making. Web sites are free to define whatever format they want for how the request should be formatted.

In this format, the URL has a standard structure:

the base URL: https://data.cityofnewyork.us/resource/nu7n-tubp.json
a ? character
or more key-value pairs (parameters), formatted as key=value pairs and separated by the & character

For example, consider the following url requests to the NYC Dog License Dataset with the Socrata API:

Try copying that URL into a browser, or just clicking on it. Depending on your browser, it may put the contents into a file attachment that you have to open up to see the contents, or it may just show the contents in a browser window.

API Documentation¶

The API documentation for the NYC Dog License Dataset contains all of the parts of the url that we need. The fields in the documentation describe the parameters we can use to filter the data in the url request. It is important to read the API documentation for every dataset you use in NYC Open Data, as each dataset has unique features that need to be considered when making requests.

Application Tokens and API Keys¶

The Socrata Open Data API has two concepts around API access: authentication and application tokens. You only need to authenticate if you wish to add, delete, or modify data that is attached to your account, or if you wish to read data you own that you have marked as private. Read-only requests only require the application token. See the SODA Producer API documentation for more details on changing data. See the API docs for more information on app tokens and API keys. For this example we are requesting data from the API for analysis, so only the application token is needed.

The Socrata Open Data API uses application tokens for two purposes:

Using an application token allows the API to throttle by application, rather than via IP address, which gives you a higher throttling limit
Authentication using OAuth (OAuth 2.0 is the preferred option for cases where you are building a web or mobile application that needs to perform actions on behalf of the user, like accessing data, and the interaction model allows you to present the user with a form to obtain their permission for the app to do so.)

See Generating App Tokens and API Keys for instructions on generating your own app token.

Using the Socrata client to make requests¶

We are using the Python sodapy library to make our request. The client sends our request for the data, and the API sends the data to us in .json format. Then the pandas library is used to format the results into a dataframe. The get() method is used with parameters to filter the data in our request. Check out the SoSQL examples in the sodapy github for more info. The filters use the SoSQL statements, which are based on SQL and have similar syntax.

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofnewyork.us", 'tpwmd1DMwYMlz8q2BgPqJ48Ky')

# Example authenticated client (needed for non-public datasets):
# client = Socrata('data.cityofnewyork.us',
#                  'appTOKEN',
#                   username='username',
#                   password='password')

# Results returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("nu7n-tubp", 
                     limit=5000,
                     #where = "extract_year = '2017' AND breedname = 'Boxer'",
                     where = "breedname = 'Akita'",
                     #where = "animalname = 'PAIGE'",
                     select = "animalname, breedname, zipcode, extract_year",
                     order = "zipcode")

# Convert to pandas DataFrame
doggy_data = pd.DataFrame.from_records(results)
doggy_data['count'] = 1

doggy_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 491 entries, 0 to 490
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   animalname    491 non-null    object
 1   breedname     491 non-null    object
 2   zipcode       491 non-null    object
 3   extract_year  358 non-null    object
 4   count         491 non-null    int64 
dtypes: int64(1), object(4)
memory usage: 19.3+ KB

doggy_data.head(20)

	animalname	breedname	zipcode	extract_year	count
0	TARA	Akita	10009	2016	1
1	LINK	Akita	10009	2016	1
2	TARA	Akita	10009	2018	1
3	TARA	Akita	10009	NaN	1
4	LINK	Akita	10009	2018	1
5	LINK	Akita	10009	2017	1
6	LINK	Akita	10009	NaN	1
7	TARA	Akita	10009	2017	1
8	SAMA	Akita	10011	2017	1
9	KUMA	Akita	10011	2018	1
10	SAMA	Akita	10011	2018	1
11	SAKURA	Akita	10012	NaN	1
12	SHIRO	Akita	10013	2018	1
13	KOTI	Akita	10013	2017	1
14	KODO	Akita	10013	NaN	1
15	KODO	Akita	10013	NaN	1
16	SHIRO	Akita	10013	2017	1
17	SHIRO	Akita	10013	2016	1
18	KOTI	Akita	10013	2018	1
19	DOLCE	Akita	10014	2018	1

doggy_data['breedname'].unique()

array(['Akita'], dtype=object)

doggy_data['extract_year'].unique()

array(['2016', '2018', nan, '2017'], dtype=object)

Visualize the data with plotly express¶

# create grouped dataframe by count of dog registration by zipcode
doggy_zips_grouped = doggy_data[['zipcode', 'count']].groupby(by = 'zipcode').sum().reset_index()

# use GeoJSON file from NYC opendata which contains GIS data for zip code boundaries in NYC
# web page with info on data set url here: 
# https://data.cityofnewyork.us/Health/Modified-Zip-Code-Tabulation-Areas-MODZCTA-/pri4-ifjk
with urlopen('https://data.cityofnewyork.us/resource/pri4-ifjk.geojson') as response:
    zip_codes = json.load(response)


fig = px.choropleth_mapbox(doggy_zips_grouped, geojson=zip_codes, locations='zipcode', color='count',
                           featureidkey='properties.modzcta',
                           color_continuous_scale="Viridis",
                           range_color=(0, max(doggy_zips_grouped['count'])),
                           mapbox_style="carto-positron",
                           zoom=9.25, center = {"lat": 40.743, "lon": -73.988},
                           opacity=0.5,
                           labels={},
                           title="Number of Dog License Registrations in NYC by Zipcode"
                          ).update(layout=dict(title=dict(x=0.5)))
fig.update_layout(margin={"r":0,"t":30,"l":0,"b":0})
fig.show()

	id	alias	name	image_url	is_closed	url	review_count	categories	rating	coordinates	transactions	price	location	phone	display_phone	distance
0	Q3fmTHT7zilDWtfzLK9lMA	truedan-new-york-3	Truedan	https://s3-media2.fl.yelpcdn.com/bphoto/BmUZXO...	False	https://www.yelp.com/biz/truedan-new-york-3?ad...	99	[{'alias': 'bubbletea', 'title': 'Bubble Tea'}]	4.5	{'latitude': 40.7191742, 'longitude': -73.9962...	[delivery, pickup]	$$	{'address1': '208 Grand St', 'address2': None,...	+16465592886	(646) 559-2886	1535.241614
1	bVJQEeRNi34-3XN_F1AZEg	xing-fu-tang-new-york	Xing Fu Tang	https://s3-media2.fl.yelpcdn.com/bphoto/cyFHFE...	False	https://www.yelp.com/biz/xing-fu-tang-new-york...	91	[{'alias': 'bubbletea', 'title': 'Bubble Tea'}]	4.5	{'latitude': 40.7288, 'longitude': -73.98775}	[delivery, pickup]	NaN	{'address1': '133 2nd Ave', 'address2': '', 'a...			2656.450427
2	7-bx74TooPuZKZDNW4WFcQ	chun-yang-tea-new-york	Chun Yang Tea	https://s3-media1.fl.yelpcdn.com/bphoto/JrGwrZ...	False	https://www.yelp.com/biz/chun-yang-tea-new-yor...	91	[{'alias': 'bubbletea', 'title': 'Bubble Tea'}]	4.5	{'latitude': 40.71617, 'longitude': -73.9971}	[delivery, pickup]	$$	{'address1': '26B Elizabeth St', 'address2': '...	+12124200123	(212) 420-0123	1212.877370
3	3aypSFXLfkAL4dhHVFobKg	lazy-sundaes-new-york-6	Lazy Sundaes	https://s3-media3.fl.yelpcdn.com/bphoto/kEfCYC...	False	https://www.yelp.com/biz/lazy-sundaes-new-york...	32	[{'alias': 'bubbletea', 'title': 'Bubble Tea'}...	4.5	{'latitude': 40.720597, 'longitude': -73.984539}	[delivery, pickup]	$$	{'address1': '23 Clinton St', 'address2': None...			1877.016204
4	zupVwJAFYkHDwrsQd2ktXA	fiftylan-union-square-new-york-3	FIFTYLAN Union Square	https://s3-media3.fl.yelpcdn.com/bphoto/UIwR20...	False	https://www.yelp.com/biz/fiftylan-union-square...	60	[{'alias': 'bubbletea', 'title': 'Bubble Tea'}]	4.0	{'latitude': 40.735516, 'longitude': -73.989191}	[delivery, pickup]	NaN	{'address1': '32 Union Square E', 'address2': ...	+16467670085	(646) 767-0085	3367.900728

Searching for Boba

Intro to the Socrata API with the NYC Dog Licensing Dataset & Python

Contents

Intro to the Socrata API with the NYC Dog Licensing Dataset & Python¶

What is an API?¶

Anatomy of a URL with the Socrata API¶

API Documentation¶

Application Tokens and API Keys¶

Using the Socrata client to make requests¶

Visualize the data with plotly express¶

Extracting Data with the Yelp Fusion API¶

Authentication & App Keys¶

To create an App and API Key:¶

Url Components¶

Python requests module¶

Visualize the data¶

Further data exploration¶