Spatial data are often more granular than we need. For example, we might have data on sub-national units, but we’re actually interested in studying patterns at the level of countries.
In a non-spatial setting, when all we need are summary statistics of the data, we aggregate our data using the groupby function. But for spatial data, we sometimes also need to aggregate geometric features. In the geopandas library, we can aggregate geometric features using the dissolve function.
groupby
dissolve
dissolve can be thought of as doing three things: (a) it dissolves all the geometries within a given group together into a single geometric feature (using the unary_union method), and (b) it aggregates all the rows of data in a group using groupby.aggregate(), and (c) it combines those two results.
unary_union
groupby.aggregate()
Suppose we are interested in studying continents, but we only have country-level data like the country dataset included in geopandas. We can easily convert this to a continent-level dataset.
First, let’s look at the most simple case where we just want continent shapes and names. By default, dissolve will pass 'first' to groupby.aggregate.
'first'
groupby.aggregate
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres')) In [2]: world = world[['continent', 'geometry']] In [3]: continents = world.dissolve(by='continent') In [4]: continents.plot(); In [5]: continents.head() Out[5]: geometry continent Africa MULTIPOLYGON (((-16.714 13.595, -17.126 14.374... Antarctica MULTIPOLYGON (((-180.000 -84.713, -179.942 -84... Asia MULTIPOLYGON (((27.192 40.691, 26.358 40.152, ... Europe MULTIPOLYGON (((-177.664 71.133, -178.694 70.8... North America MULTIPOLYGON (((-169.529 62.977, -170.291 63.1...
If we are interested in aggregate populations, however, we can pass different functions to the dissolve method to aggregate populations using the aggfunc = argument:
aggfunc =
In [6]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres')) In [7]: world = world[['continent', 'geometry', 'pop_est']] In [8]: continents = world.dissolve(by='continent', aggfunc='sum') In [9]: continents.plot(column = 'pop_est', scheme='quantiles', cmap='YlOrRd'); In [10]: continents.head() Out[10]: geometry pop_est continent Africa MULTIPOLYGON (((-16.714 13.595, -17.126 14.374... 1219176238 Antarctica MULTIPOLYGON (((-180.000 -84.713, -179.942 -84... 4050 Asia MULTIPOLYGON (((27.192 40.691, 26.358 40.152, ... 4389144868 Europe MULTIPOLYGON (((-177.664 71.133, -178.694 70.8... 746398461 North America MULTIPOLYGON (((-169.529 62.977, -170.291 63.1... 573042112
The aggfunc = argument defaults to ‘first’ which means that the first row of attributes values found in the dissolve routine will be assigned to the resultant dissolved geodataframe. However it also accepts other summary statistic options as allowed by pandas.groupby() including:
pandas.groupby()
‘first’
‘last’
‘min’
‘max’
‘sum’
‘mean’
‘median’