Materials
Data sources
Supplementary Table S1 summarizes the definitions of all the variables and Supplementary Table S2 displays the descriptive statistics of the variables. A detailed description of our data sources is summarized in Supplementary Table S3.
In summary, our mobile phone data, containing Jan 2018 to Apr 2021 visitation records to each national park and the visitors’ respective census block groups, are courtesy of SafeGraph Inc47. The geographical boundaries of national parks that are used to extract records only relevant to national parks are provided by the NPS Land Resources Division48. Finally, the racial and population demographics of each census block group are provided by the 2015-2019 American Community Survey (ACS)16.
The utilization of each distinct dataset towards the extraction of our materials of interest are elaborated in the subsequent sections.
Validation of SafeGraph’s mobile-phone dataset
The validation of SafeGraph’s mobile-phone dataset in its application to national parks has been previously validated by Yun et al17. Specifically, Yun et al’s17 work showed a close resemblance between the NPS visitor use survey and SafeGraph’s mobile-phone dataset in terms of visitation counts, temporal visitation patterns, racial demographics, and state-level residential origins of the visitors to Yellowstone National Park. However, SafeGraph’s POI classification of “National Parks” remains inconsistent with the NPS’s official definition of National Park. To circumvent this problem, we have utilized shapefiles courtesy of the NPS OpenData48 to extract the most visited POIs that fall within the shapefiles of each respective “National Park”. This process would be detailed in the subsequent sub-sections below.
Selection of mainland US national parks
We adopted the official and formal definition of national parks as defined and listed by the NPS System49.
We selected national parks within the 48 states encompassing the contiguous U.S. We chose to omit the parks that fall within the states of Alaska, Hawaii, Puerto Rico and other US minor Islands considering the fact that air travel is a necessity for out-of-state visitors to visit these select parks. These separate travel behavioral patterns could result in confounding variables towards our analysis, particularly since air travel faced major disruptions amidst the COVID-19 pandemic50.
It is worth noting that New River George National Park was declared as a national park only following the COVID-19 pandemic51. Hence, it is excluded from our study.
Finally, we lack the data availability for White Sands National Park and Dry Tortugas National Park. The former is due to its proximity to White Sands Missile Range and security concerns on mobile device data52. The latter’s lack of data availability could be attributed to the fact that the park is an island off the coast of Key West, FL53.
Henceforth, we included a grand total of 48 national parks in our study.
Extraction of POIs
We selected our points-of-interests (POIs) based on the dataset made available by SafeGraph47. While SafeGraph does provide its own classification of “national parks”, its classification methodology remains inconsistent with the NPS’s official definition and formal list of “national parks”17,49.
Hence, we extracted POIs that fall within the encompassed polygon shapefiles of each respective national park. The polygon shapefiles are courtesy of the NPS OpenData48.
We then selected the POI with the highest average monthly visitation records for each distinct national park.
The choice to select the POI with the highest visitation record could be attributed to the fact that a brief analysis reveals that in many parks, the top 5 most populated POIs tends to fall within the same vicinity17. Specifically, the top 5 most populated POIs for many large national parks, like Cuyahoga National Park, Indiana Dunes National park, and Yellowstone National Park, typically encompass the areas surrounding the park entrances17. This remains rational since visitors would have to pass through park entrances to enter the parks and gain access other areas of the park. Hence, selecting only the POI with the highest visitation record for each park prevents us from making duplicate counts from separate POIs.
Computing census block group-based racial demographics
The aforementioned Safegraph47 data provides us with the census block group origins of the visitors to each distinct POI. The census block group origins are identified by its 12-digit Federal Information Processing Standard (FIPS) code. We are thus able to retrieve our racial demographics of interests (% of non-whites, % of African-, % Hispanics-, % of Asian-, and % Native Americans) pertaining to each visitors census block origins.
Our study only considered all visitations across mainland U.S. As such, we have excluded visitors from Hawaii, Alaska, Puerto Rico and other minor US islands for their visitation patterns are expected to be abruptly disrupted following the pandemic due to restrictions put in place from air travel50. This decision would prevent the effects of confounding variables and avoid drastically skewing our data.
Computing distance travelled by visitor to each national park
Likewise, we obtain the variables of distance through the utilization of the Haversine formula54 between the POIs coordinates and the centroids of the visitors census block group. We standardize the units of distance to kilometers in our analysis.
Categorization of visitation records falling before and after COVID-19
We categorize pre-COVID era as any time-period that occurs prior to the month of March 2020. Hence, we classify the COVID era as any time period from the month of March 2020 onward. We selected March 2020 for it was the month in which the UN declared COVID-19 a global pandemic55. This declaration was proceeded by numerous state and local lockdown measures which drastically impacted American commerce56 and the lifestyles of many Americans57.
Methods and Model
Offsetting visitation counts with the census block group population
We offset our dependent variable of visitation counts per census block population because racial demographics of the visitors’ census origins are measured at a census block level. This allows us to account for the fact that one would naturally expect higher visitation counts from more populated census block groups. Hence, the visitation counts per thousand population of the census block group would serve as a function of our independent variables (COVID-19 era, distance and racial demographics). This could be illustrated in Eq. (1) in the introduction section.
Gravity Model
We incorporated gravity models into our methodology. In the context of tourism, the gravity model explores the behavior and travel patterns over distances between two unique POIs.
The gravity model was adopted from Newton’s law of universal gravitation in physics58. Newton’s law of universal gravitation states that distance and mass determine the gravitational forces between two objects. The gravity model has since been adapted by numerous disciplines in the social sciences. These topics include trade21, tourism19,20, and migration22. For instance, the gravity model is popular in studies involving bilateral trade21. This is because the gravity model allows economists to measure how specific economic indicators (such as GDP) could attract trade between two countries, given the distances between them21.
We thus elected to use the gravity model because it best represents our research theme of seeking to analyze the changes in visitations to national parks amongst individual racial communities across the U.S. Henceforth, the gravity model allows us to best analyze the change in visitations from different racial communities to each specified national park given the required distance of travel. The selection of our variables, in seeking to optimally represent the gravity model, while preserving its assumptions, would be elaborated in the subsequent subsections below.
Our application of the gravity model works as such: given (i{mathrm{th}}) census block group and (j{mathrm{th}}) national park where (alpha _k) symbolizes each respective coefficient towards the determined independent variable, the gravity model could be demonstrated as such:
$$begin{aligned} frac{visitation_{ijt}}{left( frac{population_i}{1000}right) }propto frac{race_i^{alpha _1}*interaction_terms^{alpha _2}}{distance_{ij}^{alpha _3}} end{aligned}$$
(2)
which can be remodelled as:
$$begin{aligned} visitation_{ijt}propto frac{race_i^{alpha _1}*(interaction~terms)^{alpha _2}*left( frac{population_i}{1000}right) ^{alpha _4}}{distance_{ij}^{alpha _3}} end{aligned}$$
(3)
using natural logarithms could be transformed to:
$$begin{aligned} ln (visitation_{ijt})propto {alpha _1}ln (race_i)+{alpha _2}ln (interaction~terms)+alpha _3ln (distance_{ij})+ {alpha _4}ln left( frac{population_i}{1000}right) end{aligned}$$
(4)
Model Specification
The gravity model is incorporated using panel data with interaction terms19,21. Incorporating panel data allows us to control for unobservable individual effects19,21, such as time invariant monthly and seasonal fluctuations in park visitations, as best illustrated in the peaks and troughs witnessed in Fig. 1. The interaction terms allows us to measure the impact of COVID-19 towards our selected predictors. Specifically, the random-effects panel approach was selected in favor of the fixed-effects panel model and the pooled ordinary least squares (OLS) model as evident by the results of the F-tests, Hausman’s Chi-Squared, and the Breusch-Pagan (BP) Lagrange Multiplier59 tests displayed in Supplementary Table S4.
This results in Eq. (5), given each (i{mathrm{th}}) census block group’s visitation to (j{mathrm{th}}) national parks during (t{mathrm{th}}) month over specified race (race_i).
$$begin{aligned} begin{aligned} ln left( visitation_{ijt}right)&= beta _0+beta _1(COVID~era)+beta _2[ln (race_{i})] +beta _3[ln (distance_{ij})] +beta _4left[ ln left( frac{population_{i}}{1000}right) right] {}&quad +,beta _5[COVID~eratimes ln (race_{i})] +beta _6[(COVID~eratimes ln (distance_{ij})] +beta _7[ln (distance_{ij})times ln (race_i)] {}&quad +,beta _8[(COVID~eratimes ln (distance_{ij})times ln (race_i)]+V_{ijt} end{aligned} end{aligned}$$
(5)
The assumptions of log-linearity and multi-collinearity19,20,21 in our specified model, per Eq. (5), have been tested and could be referenced in Supplementary Table S5.
Consideration of variables in our model
We explored using the size area (in km(^2)) of each respective park, instead of distance travelled, as the denominator of our gravity model per Eq. (2). However, the substantially lower (R^2) values obtained when using a park’s size suggests that a park’s area is a poor factor in explaining visitation trends across socio-economic variables. These are detailed in Supplemental Table S6.
We also initially considered fitting other socio-economic independent variables into the same analysis. We did so in the hopes of gaining further insights on COVID-19’s impact towards park visitation. Some other independent variables that were considered included median income and median age. However, fitting them into same analysis resulted in high multi-collinearity. These are detailed in Supplemental Table S6. Multi-collinearity occurs when an independent variable is highly correlated with another independent variable in an analysis involving multiple independent variables60. This could consequently “undermine the statistical significance of an independent variable”60.
To mitigate concerns of multi-collinearity in our analysis involving different racial groups, we adopt the procedures outlined by Lewis-Beck and Lewis-Beck60. Lewis-Beck and Lewis-Beck recommends separating our analysis of each racial composition. This means that we would analyze the composition of non-whites, African-, Asian-, Hispanic-, and Native American with our other variables separately.
Finally, we considered analyzing the variables of income and age separately. However, the variables of income and age still resulted in high multi-collinearity amongst the existing independent variables. Furthermore, the different characteristics displayed amongst our analysis involving variables like income and age (compared to race) meant that our suggested random-effects gravity model is not a one-size-fits-all model for other analysis involving separate variables. These are detailed in Supplemental Table S6. For this reason, we hope to study variables like age and income in some of our future studies, using a different model.
Source: Ecology - nature.com