There are two ways to combine datasets in geopandas – attribute joins and spatial joins.
In an attribute join, a GeoSeries or GeoDataFrame is combined with a regular pandas Series or DataFrame based on a common variable. This is analogous to normal merging or joining in pandas.
GeoSeries
GeoDataFrame
Series
DataFrame
In a Spatial Join, observations from two GeoSeries or GeoDataFrames are combined based on their spatial relationship to one another.
GeoDataFrames
In the following examples, we use these datasets:
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres')) In [2]: cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities')) # For attribute join In [3]: country_shapes = world[['geometry', 'iso_a3']] In [4]: country_names = world[['name', 'iso_a3']] # For spatial join In [5]: countries = world[['geometry', 'name']] In [6]: countries = countries.rename(columns={'name':'country'})
Appending GeoDataFrames and GeoSeries uses pandas append methods. Keep in mind, that appended geometry columns needs to have the same CRS.
append
# Appending GeoSeries In [7]: joined = world.geometry.append(cities.geometry) # Appending GeoDataFrames In [8]: europe = world[world.continent == 'Europe'] In [9]: asia = world[world.continent == 'Asia'] In [10]: eurasia = europe.append(asia)
Attribute joins are accomplished using the merge method. In general, it is recommended to use the merge method called from the spatial dataset. With that said, the stand-alone merge function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.
merge
left
right
For example, consider the following merge that adds full names to a GeoDataFrame that initially has only ISO codes for each country by merging it with a pandas DataFrame.
# `country_shapes` is GeoDataFrame with country shapes and iso codes In [11]: country_shapes.head() Out[11]: geometry iso_a3 0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... FJI 1 POLYGON ((33.903711197 -0.950000000, 34.072620... TZA 2 POLYGON ((-8.665589565 27.656425890, -8.665124... ESH 3 MULTIPOLYGON (((-122.840000000 49.000000000, -... CAN 4 MULTIPOLYGON (((-122.840000000 49.000000000, -... USA # `country_names` is DataFrame with country names and iso codes In [12]: country_names.head() Out[12]: name iso_a3 0 Fiji FJI 1 Tanzania TZA 2 W. Sahara ESH 3 Canada CAN 4 United States of America USA # Merge with `merge` method on shared variable (iso codes): In [13]: country_shapes = country_shapes.merge(country_names, on='iso_a3') In [14]: country_shapes.head() Out[14]: geometry ... name 0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... ... Fiji 1 POLYGON ((33.903711197 -0.950000000, 34.072620... ... Tanzania 2 POLYGON ((-8.665589565 27.656425890, -8.665124... ... W. Sahara 3 MULTIPOLYGON (((-122.840000000 49.000000000, -... ... Canada 4 MULTIPOLYGON (((-122.840000000 49.000000000, -... ... United States of America [5 rows x 3 columns]
In a Spatial Join, two geometry objects are merged based on their spatial relationship to one another.
# One GeoDataFrame of countries, one of Cities. # Want to merge so we can get each city's country. In [15]: countries.head() Out[15]: geometry country 0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji 1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania 2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara 3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada 4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America In [16]: cities.head() Out[16]: name geometry 0 Vatican City POINT (12.453386545 41.903282180) 1 San Marino POINT (12.441770158 43.936095835) 2 Vaduz POINT (9.516669473 47.133723774) 3 Luxembourg POINT (6.130002806 49.611660379) 4 Palikir POINT (158.149974324 6.916643696) # Execute spatial join In [17]: cities_with_country = geopandas.sjoin(cities, countries, how="inner", op='intersects') In [18]: cities_with_country.head() Out[18]: name geometry index_right country 0 Vatican City POINT (12.453386545 41.903282180) 141 Italy 1 San Marino POINT (12.441770158 43.936095835) 141 Italy 192 Rome POINT (12.481312563 41.897901485) 141 Italy 2 Vaduz POINT (9.516669473 47.133723774) 114 Austria 184 Vienna POINT (16.364693097 48.201961137) 114 Austria
sjoin() has two core arguments: how and op.
sjoin()
how
op
The op argument specifies how geopandas decides whether or not to join the attributes of one object to another, based on their geometric relationship.
geopandas
The values for op correspond to the names of geometric binary predicates and depend on the spatial index implementation.
The default spatial index in GeoPandas currently supports the following values for op:
intersects
contains
within
touches
crosses
overlaps
You can read more about each join type in the Shapely documentation.
The how argument specifies the type of join that will occur and which geometry is retained in the resultant geodataframe. It accepts the following options:
left: use the index from the first (or left_df) geodataframe that you provide to sjoin; retain only the left_df geometry column
sjoin
right: use index from second (or right_df); retain only the right_df geometry column
inner: use intersection of index values from both geodataframes; retain only the left_df geometry column
inner
Note more complicated spatial relationships can be studied by combining geometric operations with spatial join. To find all polygons within a given distance of a point, for example, one can first use the buffer method to expand each point into a circle of appropriate radius, then intersect those buffered circles with the polygons in question.
buffer