Data structures#

GeoPandas implements two main data structures, a GeoSeries and a GeoDataFrame. These are subclasses of pandas.Series and pandas.DataFrame, respectively.

GeoSeries#

A GeoSeries is essentially a vector where each entry in the vector is a set of shapes corresponding to one observation. An entry may consist of only one shape (like a single polygon) or multiple shapes that are meant to be thought of as one observation (like the many polygons that make up the State of Hawaii or a country like Indonesia).

GeoPandas has three basic classes of geometric objects (which are actually Shapely objects):

  • Points / Multi-Points

  • Lines / Multi-Lines

  • Polygons / Multi-Polygons

Note that all entries in a GeoSeries do not need to be of the same geometric type, although certain export operations will fail if this is not the case.

Overview of attributes and methods#

The GeoSeries class implements nearly all of the attributes and methods of Shapely objects. When applied to a GeoSeries, they will apply elementwise to all geometries in the series. Binary operations can be applied between two GeoSeries, in which case the operation is carried out elementwise. The two series will be aligned by matching indices. Binary operations can also be applied to a single geometry, in which case the operation is carried out for each element of the series with that geometry. In either case, a Series or a GeoSeries will be returned, as appropriate.

A short summary of a few attributes and methods for GeoSeries is presented here, and a full list can be found in the GeoSeries API reference. There is also a family of methods for creating new shapes by expanding existing shapes or applying set-theoretic operations like “union” described in Geometric manipulations.

Attributes#

  • area: shape area (units of projection – see projections)

  • bounds: tuple of max and min coordinates on each axis for each shape

  • total_bounds: tuple of max and min coordinates on each axis for entire GeoSeries

  • geom_type: type of geometry.

  • is_valid: tests if coordinates make a shape that is reasonable geometric shape according to the Simple Feature Access standard.

Basic methods#

Relationship tests#

GeoDataFrame#

A GeoDataFrame is a tabular data structure that contains a GeoSeries.

The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status - the “active geometry column”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), these operations will always act on the active geometry column.

The active geometry column – no matter the name of the corresponding GeoSeries – can be accessed through the geometry attribute (gdf.geometry), and the name of the geometry column can be found by typing gdf.geometry.name or gdf.active_geometry_name.

A GeoDataFrame may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the GeoDataFrame.set_geometry() method.

An example using the geoda.malaria dataset from geodatasets containing the counties of Colombia:

In [1]: import geodatasets

In [2]: colombia = geopandas.read_file(geodatasets.get_path('geoda.malaria'))

In [3]: colombia.head()
Out[3]: 
   ID      ADM0  ... RP2005                                           geometry
0   1  COLOMBIA  ...  61773  POLYGON ((-71.32639 11.84789, -71.33579 11.855...
1   2  COLOMBIA  ...  36465  POLYGON ((-72.42191 11.79824, -72.4198 11.795,...
2   3  COLOMBIA  ...  18368  POLYGON ((-72.1891 11.5242, -72.1833 11.5323, ...
3   4  COLOMBIA  ...   7566  POLYGON ((-72.638 11.3679, -72.6259 11.3499, -...
4   5  COLOMBIA  ...   9343  POLYGON ((-74.77489 10.93158, -74.7753 10.9338...

[5 rows x 51 columns]

# Plot countries
In [4]: colombia.plot(markersize=.5);
../../_images/colombia_borders.png

Currently, the column named “geometry” with county borders is the active geometry column:

In [5]: colombia.geometry.name
Out[5]: 'geometry'

You can also rename this column to “borders”:

In [6]: colombia = colombia.rename_geometry('borders')

In [7]: colombia.geometry.name
Out[7]: 'borders'

Now, you create centroids and make it the geometry:

In [8]: colombia['centroid_column'] = colombia.centroid

In [9]: colombia = colombia.set_geometry('centroid_column')

In [10]: colombia.plot();
../../_images/colombia_centroids.png

Note: A GeoDataFrame keeps track of the active column by name, so if you rename the active geometry column, you must also reset the geometry:

gdf = gdf.rename(columns={'old_name': 'new_name'}).set_geometry('new_name')

Note 2: Somewhat confusingly, by default when you use the read_file() command, the column containing spatial objects from the file is named “geometry” by default, and will be set as the active geometry column. However, despite using the same term for the name of the column and the name of the special attribute that keeps track of the active column, they are distinct. You can easily shift the active geometry column to a different GeoSeries with the set_geometry() command. Further, gdf.geometry will always return the active geometry column, not the column named geometry. If you wish to call a column named “geometry”, and a different column is the active geometry column, use gdf['geometry'], not gdf.geometry.

Attributes and methods#

Any of the attributes calls or methods described for a GeoSeries will work on a GeoDataFrame – they are just applied to the active geometry column GeoSeries.

However, GeoDataFrames also have a number few extra methods for:

Display options#

GeoPandas has an options attribute with global configuration attributes:

In [11]: import geopandas

In [12]: geopandas.options
Out[12]: 
Options(
  display_precision: None [default: None]
      The precision (maximum number of decimals) of the coordinates in the
      WKT representation in the Series/DataFrame display. By default (None),
      it tries to infer and use 3 decimals for projected coordinates and 5
      decimals for geographic coordinates.
  use_pygeos: False [default: False]
      Deprecated option previously used to enable PyGEOS. It will be removed
      in GeoPandas 1.1.
  io_engine: None [default: None]
      The default engine for ``read_file`` and ``to_file``. Options are
      'pyogrio' and 'fiona'.
  )

The geopandas.options.display_precision option can control the number of decimals to show in the display of coordinates in the geometry column. In the colombia example of above, the default is to show 5 decimals for geographic coordinates:

In [13]: colombia['centroid_column'].head()
Out[13]: 
0    POINT (-71.74594 12.00885)
1    POINT (-72.56514 11.58174)
2    POINT (-72.35203 11.32204)
3    POINT (-73.14121 11.15251)
4    POINT (-74.64555 10.88454)
Name: centroid_column, dtype: geometry

If you want to change this, for example to see more decimals, you can do:

In [14]: geopandas.options.display_precision = 9

In [15]: colombia['centroid_column'].head()
Out[15]: 
0    POINT (-71.745940217 12.008854228)
1    POINT (-72.565144214 11.581744777)
2    POINT (-72.352030378 11.322036612)
3      POINT (-73.1412073 11.152507044)
4    POINT (-74.645551117 10.884543716)
Name: centroid_column, dtype: geometry