in

Discovering spatial interaction patterns of near repeat crime by spatial association rules mining

Framework for discovering significant spatial transmission pattern of crime occurrence

In this section, a framework for discovering significant spatial interaction pattern of crime is developed. As illustrated in Fig. 1, the proposed framework comprises the following three steps.

Figure 1

Overview of framework for discovering spatial transmission patterns of crime occurrence.

Full size image

The proposed method works on a collection of crime points with spatial and temporal information. Firstly, near repeat crime pairs are identified by specifying the spatio-temporal proximity. All near repeat crime pairs would form a network structure, making it difficult to discover the dominant patterns. Therefore, we simplify the network by overlaying with spatial girds and then aggregating it. Finally, some indicators are defined to measure the spatial interaction strength, and a spatial association pattern mining approach was developed. The whole framework is designed to discover the most probable spatial transmission routes and related high flow regions. Explanation for each step is further illustrated in following sections.

Construction of crime transmission network

This study aims to discover spatial interaction patterns from a collection of discrete points. Each point represents a location where crime incident happens. However, these crime incidents are not totally independent, but related with each other in spatial aspect. The typical phenomena demonstrating such interaction is the near repeat crime. The interaction between near repeat crime pairs can be represented as a “directed link”, and a directed network can well describe the spatial interaction of all crime incidents (denoted as “transmission network”).

The crime transmission network is composed of a node set V and an edge set E, which can be denoted as N = (V, E). Each node in V indicates a crime incident and each edge represents the spatio-temporal relation between two incidents. Because the influence of a crime only existed in a limited spatial and temporal range, spatio-temporal proximity should be defined to identify the near repeat crime. Specifically, given two crime incidents c1 and c2 occurring at timestamps tA and tB, their spatial distance and time difference are denoted as rAB and tAB, respectively. A directed edge eAB is added if the following conditions are satisfied:

$$left{ {begin{array}{*{20}l} {{0} le t_{B} – t_{A} le Delta t} hfill {r_{AB} le Delta s} hfill end{array} } right.$$

(1)

where Δs and Δt are two parameters to define the spatio-temporal proximity. In this manner, a crime transmission network can be constructed with the dual constraint of spatial and temporal proximity.

Spatial aggregation based on spatial grids

In the crime transmission network, each edge stands for an instance of near repeat crime pairs. As described above, crime transmission network indicates the “spatial interaction”. To explore the spatial interaction, the spatial analysis scale should be determined first. On the other hand, because “near repeat” pairs are judged by the spatio-temporal proximity, a single crime incident may be viewed as “close pair” with many other incidents, all the “close pairs” of crime incidents may form a complex structure (like a complex network), thus making it difficult to extract dominant patterns from such complex structure. As illustrated in Fig. 2, network nodes are usually clustered and network edges are usually intersected in an unregularly way. In situation of lots of nodes and edges, it is difficult to extract dominant spatial interaction patterns from the complex network.

Figure 2

Illustrative example of spatial aggregation of original network.

Full size image

To address the above issues, we then overlay the crime transmission network with spatial grids. The advantage of applying spatial grids lies in two aspects. First, the spatial interaction should be explored at a spatial scale. The analysis scale is closely related to spatial grid size. By setting different grid sizes, multiple scales analysis results can be achieved. Second, by overlaying spatial grids with the crime transmission network, each node and edge in the network can be associated with one or several spatial grids, then the crime network can be simplified greatly by spatial aggregation. As an example illustrated in Fig. 2, each circle in sub-figure (a) represents a crime incident, and crime pairs are connected by dashed lines. Obviously, it is not easy to identify the dominant spatial patterns. The complex network can be simplified by overlaying with spatial grids. The close crime pairs can be classified into two categories: “following in same grids” and “crossing different grids”, and those crossing different grids can be used to analyze spatial interaction between different regions. In sub-figure (d), each spatial region is represented as a square, and the numbers beside links represent number of close crime pairs crossing different regions (i.e. the by spatial aggregation). In this manner, the original crime transmission network has been simplified. It should be pointed out that the “spatial aggregation” does not discard any close crime pair. Those falling in a single grid can be used to measure strength of spatial interaction, which will be described in following section.

Discovery of significant spatial interaction patterns

From the above description, we can learn that the aggregated crime network is a directed network. Each node of network represents a spatial region (spatial grid) and edges indicates near repeat pairs crossing different grids. After the aggregated crime network is obtained, the spatial association rule mining technique can be applied to discover the spatial interactions patterns. The spatio-temporal association rule mining approach is a powerful tool for discovering the interdependence relation in both spatial and temporal domains. The existing research has proved that it can not only reveal a spatial dependence structure among various spatial features or spatial objects38,39 but also discover the dynamic interactions among different spatial regions37,40,41. For example, Verhein and Chawla describe spatial interaction patterns between different regions using spatio-temporal association rules37.

In this study, we also try to summarize the spatial interaction pattern by applying spatio-temporal association rules mining. To fulfil that, following definitions are first clarified.

Definition 1

Given two adjacent spatial grids (denoted as GA and GB) and two crime incidents (c1 and c2), if c1 falls in grid GA, c2 falls in GB, and their distance satisfies the spatio-temporal proximity constraint in Eq. (1), then the pair of c1 and c2 is called an instance of flow from GA to GB and denoted as: instance (GA → GB). The total number of instance (GA → GB) is called the out flow number of (GA) and denoted as outNum(GA). Correspondingly, total number of instance (GB → GA) is called the inflow number of (GA) and denoted as inNum(GA). In addition, the total number of close pair which totally falls in grid GA is denoted as statbleNum (GA).

Definition 2

The spatial region GA is termed as a source when out flow number outNum (GA) is higher than random assumption. Conversely, region is termed as sink if inflow number inNum (GA) is higher than random assumption. A thoroughfare is a region which meets both the source and sink requirements. Collectively, sources, sinks and thoroughfares are called high flow regions in which near repeat crime pairs can be frequently observed.

Definition 3

High flow regions and transmission routes together can describe spatial interaction pattern between different regions. For regions GA and GB, if the number of instance (GA → GB) is higher than random assumption, then it is called a significant transmission route from GA → GB, denoted as route (GA → GB), while GA is called antecedent and GB is consequent of the route.

Definition 4

Another two concepts are defined to evaluate the discovered spatial transmission routes. The spatial support of a transmission route r, denoted as Sup(r), is the sum of spatial areas referenced in the antecedent and consequent of the transmission route. The confidence of a transmission route r, denoted as Conf (r), is defined as the ratio of number of instance (GA → GB) to number of instances flowing out and falling in the antecedent grid. They can be represented formally as:

$$Supleft( r right) = arealeft( {G_{A} } right) + arealeft( {G_{B} } right)$$

(2)

$$confleft( r right) = frac{{sum {instance} ;left( {G_{A} to G_{B} } right)}}{{outNumleft( {G_{A} } right) + stableNum(G_{A} )}}$$

(3)

The first three definitions are used to discover the spatial interaction pattern, while the last one can be used to evaluate the discovered results. The definition of spatial support considers spatial semantic of discovered pattern (the size of spatial area) and confidence indicates the transmission possibility between antecedent and consequent regions. Both support and confidence indicators are commonly used in Apriori-like association rule mining approaches42, while these concepts have different meanings in this study.

Based on the above concepts, spatial interaction pattern can be discovered. In spatial association pattern mining process, thresholds for indicators measuring association strength should be determined in advance, e.g. outNum and inNum in this study. However, determination of the thresholds objectively is not easy. Thus, the discovered results are evaluated via the Monte Carlo (MC) testing. In another words, we aim to find out these patterns with their indicators significantly higher than that would be observed by chance. In the current study, MC methods are employed to generate N simulated spatial crime distributions with permutation of temporal information. For example, statistical significance of spatial transmission route r can be calculated as:

$$pleft( r right) = frac{{sum {left( {instance_num^{obs} left( r right) le instance_num^{ith_sim} left( r right)} right)} + 1}}{N + 1}$$

(4)

where (instance_num^{obs} left( r right)) represent the number of instance (r) calculated on real observed data, and (nstance_num^{ith_sim} left( r right)) represent the number calculated on a simulated spatial dataset. Then, given a significant level α (0.05 by default), if the p(r) value is less than the significance level, it can be treated as a significant pattern.

Study area and material description

To evaluate the effectiveness of the proposed approach, we aim to explore the spatial interaction pattern of a robbery in the city of Philadelphia, United States. Located in southeastern Pennsylvania, Philadelphia is an economic and cultural anchor of the greater Delaware Valley, with a population of 1,580,863 (based on 2017 census-estimated results). The crime occurrence in Philadelphia consistently ranks above the national average, which is a major concern for the government. The crime-related data can be freely accessed via the OpenDataPhilly website (https://www.opendataphilly.org/), which provides both crime datasets and basic geographic data. The geographic data include administrative division and road network. The crime incidents are recorded with detailed longitude, latitude and timestamps. In this study, we mainly focus on unarmed robbery during the period of January 1st, 2016, to June 30th, 2016. During this period, the total number of unarmed robberies was 1612. We selected robbery crime as a case study because robbery is frequently observed in the study regions and have a profound effect on the quality of life in urban neighborhood43. This study aims to find out: (1) whether robbery crime exhibits the near repeat phenomena? and (2) what kinds of spatial interaction patterns are embedded in the near repeat phenomena? The study region and distribution of robbery crime are showed in the Fig. 3.

Figure 3

Study region and distribution of robbery incidents.

Full size image


Source: Ecology - nature.com

Superconductor technology for smaller, sooner fusion

Solar-powered system extracts drinkable water from “dry” air