Ibis + DuckDB geospatial: a match made on Earth

blog
duckdb
geospatial
Author

Naty Clementi

Published

December 7, 2023

Ibis now has support for DuckDB geospatial functions!

This blogpost showcases some examples of the geospatial API for the DuckDB backend. The material is inspired by the “DuckDB: Spatial Relationships” lesson from Dr. Qiusheng Wu’s course “Spatial Data Management” from the Department of Geography & Sustainability at the University of Tennessee, Knoxville.

Note

You can check Dr. Qiusheng Wu’s full Spatial Data Management course material on its website, and the classes are also on YouTube.

Installation

Install Ibis with the dependencies needed to work with geospatial data using DuckDB:

$ pip install 'ibis-framework[duckdb,geospatial]'

Data

We are going to be working with data from New York City. The database contains multiple tables with information about subway stations, streets, neighborhood, census data and, homicides. The datasets in the database are in NAD83 / UTM zone 18N projection, EPSG:26918.

from pathlib import Path
from zipfile import ZipFile
from urllib.request import urlretrieve

# Download and unzip
url = "https://open.gishub.org/data/duckdb/nyc_data.db.zip"
zip_path = Path("nyc_data.db.zip")
db_path = Path("nyc_data.db")

if not zip_path.exists():
    urlretrieve(url, zip_path)

if not db_path.exists():
    with ZipFile(zip_path) as zip_file:
        zip_file.extract("nyc_data.db")

Let’s get started

The beauty of spatial databases is that they allow us to both store and compute over geometries.

import ibis
from ibis import _

ibis.options.interactive = True

con = ibis.duckdb.connect("nyc_data.db")
con.list_tables()
['nyc_census_blocks',
 'nyc_homicides',
 'nyc_neighborhoods',
 'nyc_streets',
 'nyc_subway_stations']

We have multiple tables with information about New York City. Following Dr. Wu’s class, we’ll take a look at some spatial relations.

We can start by taking a peek at the nyc_subway_stations table.

subway_stations = con.table("nyc_subway_stations")
subway_stations
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ OBJECTID  ID       NAME          ALT_NAME         CROSS_ST         LONG_NAME                                LABEL                              BOROUGH    NGHBHD  ROUTES  TRANSFERS  COLOR         EXPRESS  CLOSED  geom                             ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64float64stringstringstringstringstringstringstringstringstringstringstringstringgeospatial:geometry              │
├──────────┼─────────┼──────────────┼─────────────────┼─────────────────┼─────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────┼────────┼───────────┼──────────────┼─────────┼────────┼──────────────────────────────────┤
│      1.0376.0Cortlandt StNULLChurch St      Cortlandt St (R,W) Manhattan           Cortlandt St (R,W)               ManhattanNULLR,W   R,W      YELLOW      NULLNULL<POINT (583521.854 4507077.863)> │
│      2.02.0Rector St   NULLNULLRector St (1) Manhattan                Rector St (1)                    ManhattanNULL1     1        RED         NULLNULL<POINT (583324.487 4506805.373)> │
│      3.01.0South Ferry NULLNULLSouth Ferry (1) Manhattan              South Ferry (1)                  ManhattanNULL1     1        RED         NULLNULL<POINT (583304.182 4506069.654)> │
│      4.0125.0138th St    Grand ConcourseGrand Concourse138th St / Grand Concourse (4,5) Bronx 138th St / Grand Concourse (4,5) Bronx    NULL4,5   4,5      GREEN       NULLNULL<POINT (590250.106 4518558.02)>  │
│      5.0126.0149th St    Grand ConcourseGrand Concourse149th St / Grand Concourse (4) Bronx   149th St / Grand Concourse (4)   Bronx    NULL4     2,4,5    GREEN       expressNULL<POINT (590454.74 4519145.72)>   │
│      6.045.0149th St    Grand ConcourseGrand Concourse149th St / Grand Concourse (2,5) Bronx 149th St / Grand Concourse (2,5) Bronx    NULL2,5   2,4,5    RED-GREEN   expressNULL<POINT (590465.893 4519168.697)> │
│      7.0127.0161st St    Yankee Stadium River Ave      161st St / Yankee Stadium (B,D,4) Bronx161st St / Yankee Stadium (B,D,4)Bronx    NULLB,D,4 B,D,4    GREEN-ORANGENULLNULL<POINT (590573.169 4520214.766)> │
│      8.0208.0167th St    NULLGrand Concourse167th St (B,D) Bronx                   167th St (B,D)                   Bronx    NULLB,D   B,D      ORANGE      NULLNULL<POINT (591252.831 4520950.353)> │
│      9.0128.0167th St    NULLRiver Ave      167th St (4) Bronx                     167th St (4)                     Bronx    NULL4     4        GREEN       NULLNULL<POINT (590946.397 4521077.319)> │
│     10.0209.0170th St    NULLGrand Concourse170th St (B,D) Bronx                   170th St (B,D)                   Bronx    NULLB,D   B,D      ORANGE      NULLNULL<POINT (591583.611 4521434.847)> │
│                                        │
└──────────┴─────────┴──────────────┴─────────────────┴─────────────────┴─────────────────────────────────────────┴───────────────────────────────────┴───────────┴────────┴────────┴───────────┴──────────────┴─────────┴────────┴──────────────────────────────────┘

Notice that the last column has a geometry type, and in this case it contains points that represent the location of each subway station. Let’s grab the entry for the Broad St subway station.

broad_station = subway_stations.filter(subway_stations.NAME == "Broad St")
broad_station
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ OBJECTID  ID       NAME      ALT_NAME  CROSS_ST  LONG_NAME                   LABEL             BOROUGH    NGHBHD  ROUTES  TRANSFERS  COLOR   EXPRESS  CLOSED  geom                             ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64float64stringstringstringstringstringstringstringstringstringstringstringstringgeospatial:geometry              │
├──────────┼─────────┼──────────┼──────────┼──────────┼────────────────────────────┼──────────────────┼───────────┼────────┼────────┼───────────┼────────┼─────────┼────────┼──────────────────────────────────┤
│    332.0304.0Broad StNULLWall St Broad St (J,M,Z) ManhattanBroad St (J,M,Z)ManhattanNULLJ,M,Z J,M,Z    BROWN expressNULL<POINT (583571.906 4506714.341)> │
└──────────┴─────────┴──────────┴──────────┴──────────┴────────────────────────────┴──────────────────┴───────────┴────────┴────────┴───────────┴────────┴─────────┴────────┴──────────────────────────────────┘

Then convert it to a scalar subquery:

broad_station_subquery = broad_station.select("geom").as_scalar()

geo_equals (ST_Equals)

In DuckDB ST_Equals returns True if two geometries are topologically equal. This means that they have the same dimension and identical coordinate values, although the order of the vertices may be different.

The following is a bit redundant but we can check if our "Broad St" point matches only one point in our data using geo_equals

subway_stations.filter(subway_stations.geom.geo_equals(broad_station_subquery))
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ OBJECTID  ID       NAME      ALT_NAME  CROSS_ST  LONG_NAME                   LABEL             BOROUGH    NGHBHD  ROUTES  TRANSFERS  COLOR   EXPRESS  CLOSED  geom                             ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64float64stringstringstringstringstringstringstringstringstringstringstringstringgeospatial:geometry              │
├──────────┼─────────┼──────────┼──────────┼──────────┼────────────────────────────┼──────────────────┼───────────┼────────┼────────┼───────────┼────────┼─────────┼────────┼──────────────────────────────────┤
│    332.0304.0Broad StNULLWall St Broad St (J,M,Z) ManhattanBroad St (J,M,Z)ManhattanNULLJ,M,Z J,M,Z    BROWN expressNULL<POINT (583571.906 4506714.341)> │
└──────────┴─────────┴──────────┴──────────┴──────────┴────────────────────────────┴──────────────────┴───────────┴────────┴────────┴───────────┴────────┴─────────┴────────┴──────────────────────────────────┘

We can also write this query without using broad_station as a variable, and with the help of the deferred expressions API, also known as the underscore API.

subway_stations.filter(_.geom.geo_equals(_.filter(_.NAME == "Broad St")[["geom"]].as_scalar()))
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ OBJECTID  ID       NAME      ALT_NAME  CROSS_ST  LONG_NAME                   LABEL             BOROUGH    NGHBHD  ROUTES  TRANSFERS  COLOR   EXPRESS  CLOSED  geom                             ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64float64stringstringstringstringstringstringstringstringstringstringstringstringgeospatial:geometry              │
├──────────┼─────────┼──────────┼──────────┼──────────┼────────────────────────────┼──────────────────┼───────────┼────────┼────────┼───────────┼────────┼─────────┼────────┼──────────────────────────────────┤
│    332.0304.0Broad StNULLWall St Broad St (J,M,Z) ManhattanBroad St (J,M,Z)ManhattanNULLJ,M,Z J,M,Z    BROWN expressNULL<POINT (583571.906 4506714.341)> │
└──────────┴─────────┴──────────┴──────────┴──────────┴────────────────────────────┴──────────────────┴───────────┴────────┴────────┴───────────┴────────┴─────────┴────────┴──────────────────────────────────┘

intersect (ST_Intersect)

Let’s locate the neighborhood of the “Broad Street” subway station using the geospatial intersect function. The intersect function returns True if two geometries have any points in common.

boroughs = con.table("nyc_neighborhoods")
boroughs
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ BORONAME       NAME                      geom                                                                             ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringstringgeospatial:geometry                                                              │
├───────────────┼──────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ Brooklyn     Bensonhurst             <MULTIPOLYGON (((582771.426 4495167.427, 584651.294 4497541.643, 585422.281 ...> │
│ Manhattan    East Village            <MULTIPOLYGON (((585508.753 4509691.267, 586826.357 4508984.188, 586726.827 ...> │
│ Manhattan    West Village            <MULTIPOLYGON (((583263.278 4509242.626, 583276.82 4509378.825, 583473.971 4...> │
│ The Bronx    Throggs Neck            <MULTIPOLYGON (((597640.009 4520272.72, 597647.746 4520617.824, 597805.462 4...> │
│ The Bronx    Wakefield-Williamsbridge<MULTIPOLYGON (((595285.205 4525938.798, 595348.545 4526158.777, 595672 4527...> │
│ Queens       Auburndale              <MULTIPOLYGON (((600973.009 4510338.857, 601002.162 4510743.044, 601131.315 ...> │
│ Manhattan    Battery Park            <MULTIPOLYGON (((583408.101 4508093.111, 583356.048 4507665.145, 583260.947 ...> │
│ Manhattan    Carnegie Hill           <MULTIPOLYGON (((588501.208 4515525.88, 588125.03 4514806.77, 587702.963 451...> │
│ Staten IslandMariners Harbor         <MULTIPOLYGON (((570300.108 4497031.156, 570393.836 4497227.426, 570478.075 ...> │
│ Staten IslandRossville               <MULTIPOLYGON (((564664.957 4489358.427, 564771.457 4489415.865, 564783.746 ...> │
│                                                                                 │
└───────────────┴──────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┘
boroughs.filter(boroughs.geom.intersects(broad_station_subquery))
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ BORONAME   NAME                geom                                                                             ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringstringgeospatial:geometry                                                              │
├───────────┼────────────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ ManhattanFinancial District<MULTIPOLYGON (((583356.048 4507665.145, 583505.038 4507562.36, 583828.604 4...> │
└───────────┴────────────────────┴──────────────────────────────────────────────────────────────────────────────────┘

d_within (ST_DWithin)

We can also find the streets near (say, within 10 meters) the Broad St subway station using the d_within function. The d_within function returns True if the geometries are within a given distance.

streets = con.table("nyc_streets")
streets
┏━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     NAME             ONEWAY  TYPE           geom                                                                             ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int32stringstringstringgeospatial:geometry                                                              │
├───────┼─────────────────┼────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│     1Shore Pky S    NULLresidential  <MULTILINESTRING ((586785.477 4492901.001, 586898.232 4492943.725, 587118.98...> │
│     2NULLNULLfootway      <MULTILINESTRING ((586645.007 4504977.75, 586664.225 4504988.544, 586672.151...> │
│     3Avenue O       NULLresidential  <MULTILINESTRING ((586750.302 4496109.722, 586837.373 4496123.393, 586929.08...> │
│     4Walsh Ct       NULLresidential  <MULTILINESTRING ((586728.696 4497971.053, 586886.358 4498000.536))>             │
│     5NULLNULLmotorway_link<MULTILINESTRING ((586587.053 4510088.25, 586641.734 4510156.835))>              │
│     6Avenue Z       NULLresidential  <MULTILINESTRING ((586792.159 4493279.322, 586978.534 4493308.473, 587056.76...> │
│     7Dank Ct        NULLresidential  <MULTILINESTRING ((586794.754 4493361.729, 586966.095 4493387.928))>             │
│     8Cumberland WalkNULLfootway      <MULTILINESTRING ((586657.468 4505324.904, 586692.489 4505320.983, 586707.76...> │
│     9Cumberland WalkNULLfootway      <MULTILINESTRING ((586670.712 4505521.567, 586667.915 4505500.551, 586657.46...> │
│    10NULLNULLresidential  <MULTILINESTRING ((586598.326 4510424.446, 586602.314 4510430.044, 586604.94...> │
│                                                                                     │
└───────┴─────────────────┴────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────────┘

Using the deferred API, we can check which streets are within d=10 meters of distance.

sts_near_broad = streets.filter(_.geom.d_within(broad_station_subquery, 10))
sts_near_broad
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     NAME       ONEWAY  TYPE          geom                                                                             ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int32stringstringstringgeospatial:geometry                                                              │
├───────┼───────────┼────────┼──────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ 17394Wall St  NULLunclassified<MULTILINESTRING ((583483.954 4506785.646, 583522.11 4506758.431, 583571.509...> │
│ 17399Broad St NULLunclassified<MULTILINESTRING ((583571.509 4506715.578, 583529.136 4506622.066, 583509.85...> │
│ 17445Nassau StNULLunclassified<MULTILINESTRING ((583571.509 4506715.578, 583610.912 4506780.181, 583641.80...> │
└───────┴───────────┴────────┴──────────────┴──────────────────────────────────────────────────────────────────────────────────┘
Note

In the previous query, streets and broad_station are different tables. We use as_scalar() to generate a scalar subquery from a table with a single column (whose shape is scalar).

To visualize the findings, we will convert the tables to GeoPandas DataFrames.

broad_station_gdf = broad_station.to_pandas()
broad_station_gdf.crs = "EPSG:26918"

sts_near_broad_gdf = sts_near_broad.to_pandas()
sts_near_broad_gdf.crs = "EPSG:26918"

streets_gdf = streets.to_pandas()
streets_gdf.crs = "EPSG:26918"
from lonboard import Map, ScatterplotLayer, PathLayer, PolygonLayer
broad_station_layer = ScatterplotLayer.from_geopandas(
    broad_station_gdf, get_fill_color="blue", get_radius=5
)
sts_near_broad_layer = PathLayer.from_geopandas(
    sts_near_broad_gdf, get_color="red", opacity=0.4, get_width=2
)
streets_layer = PathLayer.from_geopandas(streets_gdf, get_color="grey", opacity=0.3)
m = Map(
    [
        broad_station_layer,
        sts_near_broad_layer,
        streets_layer,
    ],
    view_state={"longitude": -74.01066, "latitude": 40.7069, "zoom": 16}
)
m

You can zoom in and out, and hover over the map to check on the street names.

buffer (ST_Buffer)

Next, we’ll take a look at the homicides table and showcase some additional functionality related to polygon handling.

homicides = con.table("nyc_homicides")
homicides
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ INCIDENT_D  BORONAME       NUM_VICTIM  PRIMARY_MO  ID     WEAPON  LIGHT_DARK  YEAR   geom                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ datestringstringstringint32stringstringint32geospatial:geometry              │
├────────────┼───────────────┼────────────┼────────────┼───────┼────────┼────────────┼───────┼──────────────────────────────────┤
│ 2008-01-01Brooklyn     1         NULL7gun   D         2008<POINT (592158.666 4502210.892)> │
│ 2008-01-04Manhattan    1         NULL14gun   D         2008<POINT (588654.952 4517855.383)> │
│ 2008-01-05Queens       1         NULL15gun   D         2008<POINT (605800.815 4505730.608)> │
│ 2008-01-04Queens       1         NULL16knife D         2008<POINT (594255.157 4512250.378)> │
│ 2008-01-05Queens       1         NULL18gun   D         2008<POINT (605498.135 4496052.64)>  │
│ 2008-01-07Brooklyn     1         NULL20gun   D         2008<POINT (592020.999 4505733.647)> │
│ 2008-01-10Manhattan    1         NULL22gun   D         2008<POINT (584055.518 4511774.724)> │
│ 2008-01-10Manhattan    1         NULL23gun   D         2008<POINT (587283.748 4516908.39)>  │
│ 2008-01-13Staten Island1         NULL25gun   D         2008<POINT (570593.125 4498222.78)>  │
│ 2008-01-16Queens       1         NULL27gun   D         2008<POINT (607385.969 4501506.717)> │
│                                 │
└────────────┴───────────────┴────────────┴────────────┴───────┴────────┴────────────┴───────┴──────────────────────────────────┘

Let’s use the buffer method to find homicides near our "Broad St" station point.

The buffer method computes a polygon or multipolygon that represents all points whose distance from a geometry is less than or equal to a given distance.

broad_station.geom.buffer(200)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ GeoBuffer(geom, 200.0)                                                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ geospatial:geometry                                                              │
├──────────────────────────────────────────────────────────────────────────────────┤
│ <POLYGON ((583771.906 4506714.341, 583768.063 4506675.323, 583756.682 450663...> │
└──────────────────────────────────────────────────────────────────────────────────┘

We can check the area using the area (ST_Area) function, and see that is \(~ \pi r^{2}=125664\)

broad_station.geom.buffer(200).area()
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ GeoArea(GeoBuffer(geom, 200.0)) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                         │
├─────────────────────────────────┤
│                    124857.80609 │
└─────────────────────────────────┘

To find if there were any homicides in that area, we can find where the polygon resulting from adding the 200 meters buffer to our “Broad St” station point intersects with the geometry column in our homicides table.

h_near_broad = homicides.filter(_.geom.intersects(broad_station.select(_.geom.buffer(200)).as_scalar()))
h_near_broad
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ INCIDENT_D  BORONAME   NUM_VICTIM  PRIMARY_MO  ID     WEAPON  LIGHT_DARK  YEAR   geom                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ datestringstringstringint32stringstringint32geospatial:geometry              │
├────────────┼───────────┼────────────┼────────────┼───────┼────────┼────────────┼───────┼──────────────────────────────────┤
│ 2009-07-07Manhattan1         NULL3544NULLNULL2009<POINT (583443.249 4506757.877)> │
└────────────┴───────────┴────────────┴────────────┴───────┴────────┴────────────┴───────┴──────────────────────────────────┘

It looks like there was one homicide within 200 meters from the “Broad St” station, but from this data we can’t tell the street near which it happened. However, we can check if the homicide point is within a small distance of a street.

h_street = streets.filter(_.geom.d_within(h_near_broad.select(_.geom).as_scalar(), 2))
h_street
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID     NAME       ONEWAY  TYPE          geom                                                                             ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int32stringstringstringgeospatial:geometry                                                              │
├───────┼───────────┼────────┼──────────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ 17296Rector Styes   unclassified<MULTILINESTRING ((583184.691 4506868.803, 583257.066 4506835.054, 583324.11...> │
└───────┴───────────┴────────┴──────────────┴──────────────────────────────────────────────────────────────────────────────────┘

Let’s plot this:

broad_station_zone = broad_station.mutate(geom=broad_station.geom.buffer(200))
broad_station_zone = broad_station_zone.to_pandas()
broad_station_zone.crs = "EPSG:26918"

h_near_broad_gdf = h_near_broad.to_pandas()
h_near_broad_gdf.crs = "EPSG:26918"

h_street_gdf = h_street.to_pandas()
h_street_gdf.crs = "EPSG:26918"
broad_station_layer = ScatterplotLayer.from_geopandas(
    broad_station_gdf, get_fill_color="orange", get_radius=5
)

broad_station_zone_layer = PolygonLayer.from_geopandas(
    broad_station_zone, get_fill_color="orange", opacity=0.1
)

h_near_broad_layer = ScatterplotLayer.from_geopandas(
    h_near_broad_gdf, get_fill_color="red", get_radius=5
)

h_street_layer = PathLayer.from_geopandas(
    h_street_gdf, get_color="blue", opacity=0.5, get_width=2
)

streets_layer = PathLayer.from_geopandas(streets_gdf, get_color="grey", opacity=0.3)

mh = Map(
    [
        broad_station_layer,
        broad_station_zone_layer,
        h_near_broad_layer,
        h_street_layer,
        streets_layer,
    ],
    view_state={"longitude": -74.01066, "latitude": 40.7069, "zoom": 16}
)
mh

Functions supported and next steps

At the moment in Ibis we have support for around thirty geospatial functions in DuckDB and we will add some more (see list here).

We also support reading multiple geospatial formats via read_geo().

Here are some resources to learn more about Ibis:

Chat with us on Zulip:

Back to top