Portland Greenways and Intersection Safety

Noelle Matthews; Thomas Sato

Abstract

This study examines how neighborhood greenways, traffic calming features, and related infrastructure influence crash severity for cyclists and pedestrians in Portland, Oregon. We integrated crash data from 2007–2022 with infrastructure, traffic, and environmental datasets to build a normalized intersection-level database. Exploratory analysis showed that crashes involving vulnerable road users were concentrated in high-speed, high-volume corridors without dedicated infrastructure, while intersections with greenways, calming devices, and lower speed limits experienced fewer and less severe crashes. Classification models identified behavioral factors such as drug and alcohol involvement as leading predictors of fatal outcomes, while infrastructure features including lighting, speed limits, and traffic calming devices were strongly associated with injury severity. These findings highlight the combined importance of behavioral interventions and targeted infrastructure investment in reducing crash risk and support data-driven strategies aligned with Portland’s Vision Zero goals.

Introduction

Cities across the United States are reimagining their streets to support safer, more sustainable modes of transportation. As communities work to reduce traffic-related injuries, lower greenhouse gas emissions, and promote active lifestyles, the design of streets for non-motorized users has become central to urban planning.

Portland, Oregon has long been recognized as a national leader in active transportation. Over the past two decades, the city has expanded its network of neighborhood greenways, which are low-traffic, low-speed residential streets designed to prioritize people walking, biking, and rolling. These greenways are paired with traffic calming elements that slow vehicles and discourage cut-through driving (Portland Bureau of Transportation 2023a). Along with other infrastructure investments for pedestrians and cyclists, they are intended to create a connected network of safe, comfortable routes that encourage active travel for daily trips.

Today, roughly 6 percent of Portland commuters travel by bicycle, and the city maintains a Walk Score of 76, reflecting its strong culture of walking and cycling (Portland Bureau of Transportation 2023b; Walk Score 2025). Portland’s adoption of a Vision Zero policy further underscores its commitment to eliminating traffic deaths and serious injuries (Portland Bureau of Transportation 2023c). Despite these efforts, crashes involving cyclists and pedestrians remain a persistent challenge, particularly at intersections where multiple modes converge.

This study examines whether features such as neighborhood greenways, traffic calming strategies including speed bumps, protected crossings, and traffic diverters, speed limits, and street lighting are associated with reductions in crash frequency or severity for cyclists and pedestrians. By linking crash records from 2007 to 2022 with detailed infrastructure data, we aim to evaluate how street-level design elements relate to safety outcomes for vulnerable road users. Through this analysis, we seek to provide insights that support evidence-based planning and help Portland better assess the impact of its infrastructure investments on those most at risk in its transportation system.

Background

Efforts to improve street safety for pedestrians and cyclists increasingly focus on the role of infrastructure design. While adding bicycle lanes, pedestrian crossings, and traffic calming features has become common practice in urban transportation planning, the effectiveness of these interventions depends on implementation quality, placement, and surrounding traffic conditions.

Neighborhood greenways, introduced earlier as low-speed, low-traffic streets prioritizing people walking, biking, and rolling (Portland Bureau of Transportation 2023a), are central to Portland’s active transportation approach. These routes rely on design elements such as speed bumps, traffic diverters, protected crossings, and wayfinding signage to create a connected, low-stress network for non-motorized travel (Figure 1).

Figure 1: Components of a neighborhood greenway. — **Figure 1:** Components of a neighborhood greenway.

Complementing greenways, other traffic calming measures include curb extensions, raised crosswalks, and median islands, which improve visibility, shorten crossing distances, and reduce turning speeds (Figure 2).

Figure 2: Curb extension in Portland’s Centennial neighborhood. — **Figure 2:** Curb extension in Portland’s Centennial neighborhood.

Protected bicycle infrastructure also plays a role. Portland employs a range of bikeway designs, from standard bike lanes to buffered, bollard-protected, and curb-protected facilities, each offering varying levels of physical separation from vehicle traffic (Figure 3).

Figure 3: Examples of bike lane typologies used in Portland. — **Figure 3:** Examples of bike lane typologies used in Portland.

Since the early 2000s, the Portland Bureau of Transportation (PBOT) has expanded its greenway network to more than 100 miles and paired these investments with Vision Zero and Safe Routes to School initiatives (Portland Bureau of Transportation 2023a). These policies prioritize equitable access and the reduction of traffic-related fatalities, often focusing on high-risk intersections identified through crash data (Portland Bureau of Transportation 2023d).

However, existing research offers mixed findings. While several studies associate protected bike lanes, lighting, and calming measures with reduced crash rates and severity (Reynolds et al. 2009; Lusk et al. 2011), others emphasize broader contextual factors such as traffic volume, land use, and roadway geometry (Oregon Department of Transportation 2022). These gaps highlight the need for localized, high-resolution analysis.

This study addresses this gap by leveraging intersection-level crash and infrastructure data from 2007–2022 to examine how design interventions, particularly greenways and traffic calming measures, relate to safety outcomes for vulnerable road users.

Data & Data Engineering

This project uses spatial and tabular datasets from public sources including the Oregon Department of Transportation (ODOT), the Portland Bureau of Transportation (PBOT), and Oregon’s GEOHub. Our analysis focuses on intersection-level crash patterns from 2007 to 2022 and integrates infrastructure, lighting, traffic volume, and speed environment data.

Early consultations with Portland’s Vision Zero team and a senior transportation planner from Parametrix shaped our approach to sourcing, processing, and interpreting these datasets. These conversations clarified where infrastructure records were maintained, how features were applied in practice, and which fields best supported reliable spatial joins. They also highlighted common issues such as inconsistent street naming, missing coordinates, and duplicated records. This guidance informed our decision to build a normalized schema centered on a master intrsct table and to prioritize features most relevant to active transportation safety, including neighborhood greenways, traffic calming measures, and street lighting coverage.

We created a fully normalized PostgreSQL database anchored by the intrsct table. All supporting datasets were standardized and aligned to WGS 84 (World Geodetic System 1984), a global coordinate reference system used for latitude and longitude positioning. WGS 84 provides a common geographic framework that ensures spatial data from different sources can be accurately overlaid and compared. Once aligned, datasets were joined using either shared identifiers or proximity-based spatial joins. This unified structure allowed for consistent querying, accurate integration of diverse datasets, and intersection-level attribution across all features (Figure 4).

Crash Data (`crashes`)

The crashes table contains more than 350,000 records from ODOT. Street names were standardized to ensure alignment across tables, entries lacking valid coordinates were removed and unmatched crashes were assigned an intersection_id through nearest-neighbor joins within 100 meters. Key fields include crash date (crash_dt), severity (crash_svrty_long_desc), highest injury severity (highest_inj_svrty_desc), and participant counts (tot_pedcycl_cnt, tot_ped_cnt).

Crash data from 2020 and 2021 reflect atypical conditions caused by the COVID-19 pandemic, when traffic volumes declined sharply but crash severity often increased due to higher travel speeds on less congested roads. These years were retained in the dataset to preserve continuity across the study period, though their unique context is addressed in our Data Ethics discussion.

Bicycle Infrastructure (`network`)

PBOT’s bicycle network dataset catalogs on-street bicycle facilities, including neighborhood greenways, bike lanes, buffered bike lanes, shared roadways, and multi-use paths. These facility types represent the range of infrastructure shown in Figure 3 of the Background section, from low-stress residential streets prioritized for bikes to higher-volume corridors with painted lanes or shared-use markings. The dataset includes attributes such as facility type (facility) and installation year (year_built). Records were standardized and spatially joined to intersections in intrsct to enable comparisons of crash outcomes across different infrastructure contexts.

Street Lighting (`slights`)

The slights table includes point-level records of streetlights with attributes such as wattage (watts) and lumen output (lumen). Coordinates were transformed to WGS 84 and joined to their nearest intersection. Aggregated lighting measures, including total wattage and light counts per intersection, provide a proxy for illumination quality.

Traffic Calming Features (`calming`)

The calming table documents physical features designed to slow or divert vehicle traffic, such as speed humps, raised crosswalks, curb extensions, and rumble strips. Each record includes a feature type (feature_type) and coordinates, which were spatially joined to intrsct to create intersection-level indicators of traffic calming. These features represent key tools used in Portland’s neighborhood greenways and local safety projects.

Traffic Volume (`vcounts`)

The vcounts table provides vehicle counts by intersection, including AM and PM peak volumes (am_volume, pm_volume) and average daily traffic (avg_daily_traffic). These measures support crash rate calculations and help contextualize risk relative to traffic exposure.

Speed Limits (`slimits`)

Initially excluded due to incomplete identifiers, speed limit data were later processed using proximity-based joins to assign posted speed values (speed_limit) to intersections. This variable supports analysis of speed environments in relation to crash frequency and severity.

Intersections (`intrsct`)

The intrsct table, generated using the osmnx Python package and OpenStreetMap data, serves as the schema’s hub. Each record includes an intersection_id, intersecting street names (from_street, to_street), and geographic coordinates. All other tables are joined to this reference either directly or through spatial proximity.

Population (`population`)

We incorporated citywide annual population estimates in a standalone table. While not tied to the intersection schema, these data enable per capita crash rate calculations and provide demographic context for temporal trends in transportation safety.

Excluded Data

We excluded the recommended_bicycle_routes dataset due to overlap with PBOT’s bicycle network layer and its lack of unique detail. It could not be confidently joined and was unlikely to provide additional explanatory value beyond the infrastructure measures already included.

Schema Normalization

All datasets were aligned to WGS 84, cleaned, and integrated into a relational structure centered on intrsct. Each table contains only fields relevant to analysis and uses consistent naming conventions. This schema enables accurate intersection-level attribution of crashes, infrastructure, and environmental features, supporting both descriptive analysis and modeling of safety outcomes.

Figure 4: ER diagram of the created tables and their relationships.

Together, these steps provided a cohesive database to support our analysis while also raising considerations about how these public datasets are constructed and what inherent limitations they may carry.

Data Ethics

Because this project relies entirely on public data, privacy concerns are minimal. Crash records provided by ODOT are de-identified and contain no personally identifiable information, while infrastructure and traffic datasets describe physical features rather than individuals. However, ethical considerations arise in terms of representativeness, equity, and how these data reflect real-world conditions.

In early conversations with the City of Portland’s Vision Zero team and a senior transportation planner from Parametrix, both emphasized that crash data are inherently incomplete and prone to bias. Many crashes go unreported, particularly those involving minor injuries or property damage. Underreporting disproportionately affects vulnerable populations, including low-income residents and people of color, who may have less trust in law enforcement or limited access to insurance and legal resources that drive formal reporting. This structural bias can skew analyses toward more severe collisions and better-resourced neighborhoods where reporting rates are higher. Similarly, infrastructure records reflect uneven investment patterns, with traffic calming features and greenways concentrated in inner neighborhoods while areas like East Portland remain underrepresented. These disparities create an equity concern because they risk reinforcing existing gaps in safety infrastructure.

The COVID-19 pandemic further complicates interpretation of these data. Traffic volumes dropped sharply in 2020 and 2021, yet crash severity often increased due to higher travel speeds on less congested roads. These disruptions mean that crash trends during these years do not follow typical patterns, and including them without context could distort both descriptive findings and model performance. While we retained these years in our dataset to reflect the full study period, their unique conditions must be considered when interpreting results.

Another limitation in assessing infrastructure effectiveness is the lag between installation and widespread user adoption. Even after new facilities are built, it often takes time for riders to learn how to navigate them or even become aware they exist. For example, PBOT published guidance on how to use its new two-way bikeway and bike signal at NW Naito and Davis, illustrating the need for active education when infrastructure is introduced. Similarly, ongoing advocacy around bike wayfinding reflects that many potential users remain unaware of greenway routes or how to access them. This adoption lag can delay observable safety benefits in crash data, meaning early analyses may underestimate the true long-term impact of improvements.

Finally, our use of spatial joins introduces a methodological approximation. Assigning crashes and infrastructure features to intersections based on a 100-meter buffer was necessary for integration but can result in occasional misalignment in dense urban grids. While these joins were validated visually through mapping, they remain a source of potential imprecision.

By explicitly acknowledging these factors, we aim to frame our findings responsibly. The relationships observed between infrastructure and crash outcomes are associative rather than causal, and any interpretation should consider the biases embedded in how transportation data are collected, reported, and shaped by external events and behavioral factors such as the pandemic and infrastructure adoption lag.

Data Engineering and Exploration

Building this database required extensive preprocessing and integration across multiple spatial and tabular datasets. These steps were essential to ensure accurate intersection-level joins and to support subsequent exploratory analyses and modeling.

Data Engineering Workflow

Data Acquisition and Cleaning

We obtained data from the Oregon Department of Transportation (ODOT), the Portland Bureau of Transportation (PBOT), and Oregon GEOHub in their original formats, including CSV files, shapefiles, and API outputs. Cleaning steps focused on aligning datasets both geographically and structurally. Crashes were filtered to include only those within Portland city limits and non-highway segments. Records with null or invalid coordinates, such as those present in the crashes and slimits tables, were removed. Street names were standardized across crashes, network, and calming to ensure join consistency. Categorical fields, such as traffic calming feature codes, were encoded uniformly, and naming inconsistencies (e.g., WAY versus WY) were corrected. For crashes without an initial intersection match, we used nearest-neighbor joins to impute intersection_id assignments, validating the resulting joins through interactive mapping. These processes reduced the crashes dataset from 356,280 metro-area records to 94,841 Portland-specific records suitable for integration with infrastructure layers.

Schema Integration

We organized all datasets using a hub-and-spoke relational schema anchored by the intrsct table, which enabled scalable intersection-level analysis and efficient feature aggregation (Figure 4).

Feature Engineering

We derived several intersection-level variables to capture infrastructure and environmental context. Lighting characteristics were aggregated from the slights table, including counts of streetlights and total wattage, to serve as proxies for illumination. Traffic calming features, such as speed humps, raised crosswalks, curb extensions, and rumble strips, were encoded as binary indicators from the calming table. Bicycle infrastructure was mapped using the network dataset, including the presence of neighborhood greenways and bike lanes, along with installation years to support pre- and post-installation comparisons. Speed environments were characterized by assigning posted speed limits from the slimits table via proximity-based joins. Traffic exposure was controlled through average daily traffic (ADT) data from vcounts. Additionally, crash severity was collapsed into an ordinal structure encompassing fatal, injury, and property-damage-only categories. Annual population estimates were included separately to provide per-capita context for citywide crash trends, although they were not directly tied to intersection-level records.

Data Validation

Validation procedures confirmed the uniqueness of intersection_id values within the intrsct table and quantified the rate of unmatched joins. Approximately 25 percent of filtered crashes initially lacked an intersection match; these were reassigned using spatial proximity within a 100-meter buffer, with results verified through visual inspection in Leaflet. These steps ensured a robust, geographically coherent database suitable for subsequent exploratory analysis and modeling.

This validated and normalized database provided the foundation for our exploratory data analysis, enabling intersection-level visualizations and statistical modeling of crash and infrastructure patterns.

Analysis

Exploratory Data Analysis (EDA)

Crash Trends

As shown in Figure 5, annual crash counts have remained relatively stable over the study period, with a temporary decline during 2019 to 2021 coinciding with the COVID-19 pandemic. Bicycle and pedestrian-involved crashes consistently represent a small but steady share of all reported crashes.

Figure 5: Annual crash counts comparing all crashes with those involving bicycles or pedestrians. — **Figure 5:** Annual crash counts comparing all crashes with those involving bicycles or pedestrians.

A heat density map (Figure 6) highlights clusters of bicycle and pedestrian-involved crashes across Portland, with concentrations most evident in the downtown core and along key corridors extending into outer neighborhoods. This visualization establishes a baseline view of where crashes are most frequent and sets the stage for examining how infrastructure and speed environments relate to these patterns.

Figure 6: Heat density of bicycle and pedestrian-involved crashes across Portland.

Crash Severity

As shown in Figure 7, the distribution of injury severity shifts toward more serious outcomes for vulnerable road users. Compared to all crashes, bicycle and pedestrian-involved crashes rarely fall into the “no apparent injury/property damage only (PDO)” category and are far more likely to result in suspected minor or serious injuries, with fatalities present in smaller numbers. This pattern reflects the reality that collisions involving people on bikes or on foot almost always result in some degree of injury.

Figure 7: Injury severity distribution comparing all crashes with bicycle and pedestrian-involved crashes. — **Figure 7:** Injury severity distribution comparing all crashes with bicycle and pedestrian-involved crashes.

Infrastructure Coverage and Grouping

Figure 8 shows that most crashes occurred at intersections without bicycle-specific infrastructure, reflecting both the broader street network composition and higher exposure along non-greenway routes. Intersections with greenways or calming features represented a much smaller share of crashes, consistent with their more limited coverage across the city. When greenways and calming features did overlap, crashes were comparatively rare, underscoring their intended role as lower-stress corridors. This figure shows crash counts rather than crash rates, providing a baseline view before incorporating traffic volume in the next section.

Figure 8: Crash counts by infrastructure grouping, comparing intersections with no infrastructure to those with greenways and calming features. — **Figure 8:** Crash counts by infrastructure grouping, comparing intersections with no infrastructure to those with greenways and calming features.

Crash Rates by Infrastructure and Traffic Volume

Crash rates per intersection were examined across combined infrastructure groupings and stratified by traffic volume tertiles. As shown in Figure 9, rates were substantially higher in high-volume areas overall, but intersections with calming features (particularly those paired with greenways) experienced markedly fewer crashes than those lacking interventions.

In medium- and low-volume areas, crash rates were much lower across all infrastructure types, and differences between groups were smaller. This suggests that infrastructure interventions may provide the greatest safety benefits where traffic exposure is highest and risk is more concentrated.

Figure 9: Bicycle and pedestrian-involved crash rates per intersection by infrastructure grouping and traffic volume tertile, highlighting the stronger impact of calming features in high-volume areas. — **Figure 9:** Bicycle and pedestrian-involved crash rates per intersection by infrastructure grouping and traffic volume tertile, highlighting the stronger impact of calming features in high-volume areas.

Speed Environment

As shown in Figure 10, crashes on greenways are concentrated in lower speed environments, typically 20 to 25 mph, while crashes on regular streets span a wider range of posted speed limits, extending to 35 mph and above. Severe and fatal crashes are more common at the higher end of this range, reinforcing the established relationship between speed and injury severity. A small number of records show unusually low or zero posted speeds, generally reflecting locations such as alleys, parking lots, or other non-standard roadways.

Figure 10: Distribution of posted speed limits at crash locations by street type and crash severity, highlighting that severe crashes are concentrated at higher speeds. — **Figure 10:** Distribution of posted speed limits at crash locations by street type and crash severity, highlighting that severe crashes are concentrated at higher speeds.

Spatial Patterns

The interactive map in Figure 11 provides a citywide view of bicycle and pedestrian-involved crashes alongside Portland’s active transportation network. The map distinguishes Neighborhood Greenways, other bike facilities, and off-street trails, and overlays traffic calming features to show their distribution relative to crash locations.

Figure 11: Interactive map of bicycle and pedestrian-involved crashes, greenways, other bike facilities, off-street trails, and traffic calming features.

Integrating Findings

The exploratory analysis shows that bicycle and pedestrian-involved crashes are concentrated in high-speed, high-volume corridors with limited infrastructure, while intersections with greenways, calming features, and lower speed limits experience fewer and less severe crashes. These patterns informed the modeling approach by emphasizing variables tied to traffic exposure, speed environments, infrastructure presence, and environmental conditions such as lighting and surface state.

By moving from descriptive trends to formal modeling, we tested whether these observed relationships hold when accounting for multiple factors simultaneously and examined which features most strongly influence crash severity outcomes for different road user groups.

Feature Importance Analysis

Building on the exploratory findings, we applied classification models to assess which features most strongly influence crash severity outcomes. The goal was not to maximize predictive performance but to identify the relative importance of variables highlighted during EDA, such as speed limits, traffic volume, lighting, calming devices, and bike infrastructure.

By focusing on feature importance rather than prediction accuracy, we aimed to evaluate whether the patterns observed in descriptive analysis hold when controlling for other factors. While predictive strength still offers context for interpreting these relationships, the primary value of this modeling approach lies in clarifying which infrastructure, environmental, and contextual features are most associated with injury severity for cyclists and pedestrians.

Data Subsets

To enhance the accuracy and interpretability of predictive models for crash severity, this study employs both aggregate and disaggregated datasets. Specifically, models are trained on all crash data, bicycle-involved crashes, and pedestrian-involved crashes as separate subsets. This approach is grounded in the recognition that different road user types are subject to distinct risk profiles, behavioral dynamics, and environmental interactions.

Modeling all crashes provides a comprehensive understanding of severity factors across the full spectrum of road users. However, disaggregating the data by road user type allows the models to capture subgroup-specific patterns that may be obscured in the aggregate analysis. For instance, factors influencing injury severity in pedestrian-involved crashes, such as lighting conditions or road infrastructure features, may differ substantially from those relevant to motor vehicle collisions or bicycle-related incidents.

Disaggregating the data allows for the identification of road-user-specific risk factors that may be diluted or overlooked in models trained on the full dataset. By isolating the relationships between predictors and severity outcomes within bicycle- or pedestrian-involved crashes, the resulting feature importance measures can offer more targeted insights. This approach supports more nuanced interpretations of the data and facilitates the development of tailored safety interventions for each road user group.

Target Variables

The target variable, crash severity, can be framed in a variety of different but related ways, each capturing a different aspect of the outcome. In this study, we examine three binary classification targets related to crash severity:

Fatal Crash: whether or not the crash resulted in a fatality.

Injury Crash: whether or not any injury was reported.

Major Injury Crash: whether the crash resulted in a severe injury, based on a consolidated classification.

For the third target, we group the injury types shown in Figure 7 into two broader categories: major injuries, comprising “Fatal Injury,” “Suspected Serious Injury,” and “Suspected Minor Injury”; and minor or no injuries, comprising “Possible Injury” and “No Apparent Injury.” This aggregation reflects a simplified but relevant distinction between more and less severe injury outcomes.

Although all three targets serve as proxies for crash severity, each provides a unique lens for analysis. For instance, fatal crashes are rare but socially critical, while the injury-based targets may capture broader systemic risk patterns across road users.

Because all three targets exhibit varying degrees of class imbalance, particularly fatal crashes, model interpretation must be situated within the context of imbalanced learning. Figure 12 presents the distribution of each crash severity target by road user type to illustrate the degree of imbalance across populations.

Figure 12: Proportion of Road User Crash Outcome by Type — **Figure 12:** Proportion of Road User Crash Outcome by Type

Of significant note is the class imbalance between fatal and non-fatal car crashes. Of the 94,841 crashes in the final dataset, 373 (~0.4%) were fatal. Of those, 23 were bicycle crashes, and 148 were pedestrian crashes. The class imbalance persists for Bicycle, Pedestrian, and Motorists.

Unsurprisingly, most reported crashes that involve a bicycle or pedestrian also involve an injury. The remaining class imbalance for motorists is roughly divided in two, which justifies the class balance when training a model on all crashes rather than subsets of bicycles or pedestrians.

On the other hand, by composing a feature of injury severity as explained above, the class balance between major and minor injuries for bikes and pedestrians are far more comparable.

Predictor Subsets

Given the complexity and variation across road users and crash severity outcomes, it is equally important to consider the types of predictors used in modeling. To address this, we implement two modeling strategies: one utilizing all available predictors, and another constrained to infrastructure-specific features (e.g., speed limits, bike lanes, lighting levels).

While models with all predictors allow for a more complete understanding of the multifaceted contributors of crash severity, they may also incorporate variables that are less actionable from a planning or policy standpoint. In contrast, infrastructure-only models are designed to isolate the influence of modifiable aspects of the built environment, which are most relevant to decision-makers in transportation planning and roadway design.

By comparing the feature importance outputs from these two model types, we can assess the extent to which crash severity can be explained by infrastructure alone, versus more complex contextual or behavioral variables. This dual approach supports both broad analytical insight and practical, intervention-oriented recommendations.

Below, Table 1 and Table 2 describe the sets of predictors that were used in the modeling process.

Table 1: Description of all predictors used in models.

Table 2: Description of infrastructural predictors used in models.

Modeling

Given the central importance of feature importance in this study, we selected classification models that facilitate interpretable insights into the predictors of crash severity. Four modeling techniques were employed: Random Forest, Logistic Regression, and Logistic Regression with LASSO and Ridge regularization. These methods were chosen to provide multiple perspectives on variable importance while also balancing model simplicity and interpretability.

Models were trained for each combination of data subsets (all crashes, bicycle-involved crashes, pedestrian-involved crashes), target outcomes (injury vs. non-injury, fatal vs. non-fatal, degree of injury), predictor subsets (all features versus infrastructure-specific features), and model type. Each model was trained and evaluated using a five-fold cross-validation scheme on an 80/20 train-test split.

To account for class imbalances common in crash severity data, model performance was assessed using Cohen’s Kappa statistic rather than standard accuracy metrics. This choice enables a more reliable evaluation of predictive agreement beyond random chance.

To assess model stability and the consistency of feature importance, each model configuration was trained and evaluated across five iterations with different random seeds. For each configuration, the minimum and maximum Kappa values are reported, along with the three most influential features determined by a majority vote across iterations. This framework yields a detailed comparison of model performance and key predictors across varying data partitions and model specifications.

Results

This section presents the results of our classification models across combinations of crash types (all, bicycle-involved, pedestrian-involved), outcome variables (fatal, injury, major injury), predictor sets (all features, infrastructure-only), and modeling approaches (Logistic Regression, LASSO, Ridge, Random Forest). Our focus is on model interpretability, with an emphasis on feature importance rankings rather than predictive performance alone. Cohen’s Kappa is used as the primary evaluation metric due to class imbalance. For each modeling configuration, we report predictive agreement and the top three contributing features. A detailed table view is presented in the Appendix.

Modeling Results - All Crashes

The below figures displays heatmaps of the ranks of each feature per a particular model. The left side of the plot shows the results for the models trained on all predictors, and the right side of the plot shows the results for the models trained on only the infrastructural predictors. If none of the models had a predictor ranked in its top three most important features, it was not included in the visual. As an example, in Figure 13a, for each of the four models whose target variable was “Fatal vs Non-Fatal”, five model runs were completed. Within each model, the feature “Drugs Involved” was considered most important among all five runs.

Some of the models have more than one feature per a particular rank in the top three. These are ties based on the most common occurrence among the five model runs that were completed.

The LASSO model has the power to shrink some coefficients on predictors to zero. If this occurs for every predictor in the dataset, then no features will show up in the top three. This occurs in some of the bike/pedestrian data subsets in Figure 13b and Figure 13c.

Figure 13a: Feature importance heatmap across model configurations (all crashes) — **Figure 13a:** Feature importance heatmap across model configurations (all crashes)

When analyzing models trained on the complete crash dataset, distinct patterns emerge in the predictors for different severity targets. Notably, the Fatal vs. Non-Fatal classification models consistently identify Drug Involvement as the most influential predictor across all model types. This variable is followed in importance by Crash Speed Involvement and Alcohol Involvement, suggesting that these behavioral factors are key determinants of fatal crash outcomes in the full dataset.

Models targeting Degree of Injury outcomes exhibit similar preferences, frequently selecting Drug Involvement and Alcohol Involvement as top-ranked features. However, both Degree of Injury and Injured vs Non-Injured models also rank road condition features such as “Road Surface – Unknown” highly, particularly in LASSO and Ridge regression. This repeated prominence suggests a potential association between uncertain or missing surface condition data and injury severity, possibly reflecting either a true risk factor or a proxy for incomplete contextual information.

In contrast to the linear models, Random Forests tended to emphasize numeric and continuous features such as Total Watts, Speed Limit, and Crash Hour. This divergence in feature selection likely reflects the Random Forest’s capacity to capture non-linear relationships and complex interactions that are less accessible to linear or penalized regression approaches.

In the models that were trained on infrastructural features only, the presence of other infrastructure features (Infrastructure Present) emerges as particularly important in predicting injury outcomes. For models predicting Fatal vs. Non-Fatal outcomes using infrastructure-only features, the Sum of Traffic Control Devices appears as the most predictive feature, suggesting a potential mitigating effect of traffic-calming infrastructure on crash fatality.

Modeling Results - Bicycle Only Crashes

Figure 13b: Feature importance heatmap across model configurations (bicycle crashes) — **Figure 13b:** Feature importance heatmap across model configurations (bicycle crashes)

Models trained exclusively on bicycle-involved crashes exhibit patterns largely consistent with those observed in the full crash dataset. In particular, for the Fatal vs. Non-Fatal outcome, Drug Involvement remains the most influential predictor across all model configurations. This variable also appears frequently among top-ranked features in models predicting Degree of Injury, alongside Alcohol Involvement and Crash Speed Involvement, reinforcing the significance of behavioral factors in severe bicycle-related crash outcomes.

Within the Degree of Injury models, both LASSO and Ridge regressions highlight snowy or icy road conditions as among the most predictive features. This finding suggests that adverse environmental conditions may play a heightened role in determining injury severity when bicycles are involved, potentially due to cyclists’ increased vulnerability to surface hazards.

Overall, feature rankings in the bicycle-only models remain broadly consistent with those trained on the full dataset, particularly in the prioritization of behavioral and environmental risk factors.

In contrast, the infrastructure-only models for bicycle crashes do not reveal strong or consistent patterns across target outcomes. Nevertheless, some model-specific tendencies emerge: Bike Lane presence is frequently identified as a top feature in linear models, while Total Watts appears as a dominant predictor in Random Forest models. These results suggest that while infrastructure variables may influence crash severity for cyclists, their effects may be more context-dependent and less universally predictive than behavioral or environmental factors.

Modeling Results - Pedestrian Only Crashes

Figure 13c: Feature importance heatmap across model configurations (pedestrian crashes) — **Figure 13c:** Feature importance heatmap across model configurations (pedestrian crashes)

Pedestrian-involved crash models continue to reflect patterns observed in the broader dataset, with behavioral features playing a central role in predicting crash severity. Across models targeting both Fatal vs. Non-Fatal and Degree of Injury outcomes, Drug Involvement remains the most frequently selected feature. Notably, in regularized models (LASSO and Ridge), Marijuana Involved consistently ranks as the most influential predictor, suggesting a potential substance-specific association with pedestrian crash outcomes.

While no dominant pattern emerges in the Injured vs. Non-Injured models, certain features such as Driveway or Alley presence and the Sum of Traffic Calming Devices appear with some regularity in the linear models. Additionally, for Fatal vs. Non-Fatal predictions, the Hit and Run indicator is frequently identified as a top feature, reflecting its potential link to crash severity in pedestrian incidents.

In the infrastructure-only models, Bike Lane presence emerges as a leading predictor in Degree of Injury models using logistic regression, with broader infrastructure presence indicators following closely. In the Injured vs. Non-Injured models, the Sum of Traffic Calming Devices appears again as a contributing feature. For Fatal vs. Non-Fatal predictions, Speed Limit is the most significant predictor in logistic regression models and ranks second in Random Forest models. Total Watts also appears consistently across all three target outcomes, underscoring the potential role of visibility and lighting in pedestrian crash severity.

Model Performance

Across all modeling configurations, predictive performance was generally modest, with no single combination of model type, outcome variable, and predictor set consistently outperforming the others. Nonetheless, certain configurations produced comparatively stronger results. Models predicting Fatal vs. Non-Fatal outcomes for pedestrian-only crashes using all predictors achieved some of the highest Kappa values, indicating improved capacity to account for the pronounced class imbalance in this subset. Similarly, models targeting Degree of Injury for bicycle-only crashes with all predictors tended to outperform other bicycle crash configurations.

In the full crash dataset, the Injured vs. Non-Injured outcome with all predictors yielded the highest average Kappa values (approximately 0.10) across all four modeling approaches. Linear models, in particular, showed relatively stronger performance for Fatal vs. Non-Fatal predictions.

Performance varied substantially across repeated runs, reinforcing the decision to aggregate results over five random seeds. For example, a logistic regression model predicting Fatal vs. Non-Fatal outcomes for bicycle–motor vehicle crashes with all predictors achieved a Kappa of 0.665 in its highest-performing run but dropped to 0.000 in its lowest, despite identical settings aside from the random seed. This variability highlights the instability of results in smaller or more imbalanced subsets and underscores the need for aggregation to obtain reliable estimates of feature importance.

While overall predictive performance was limited, the primary objective of this analysis is not to maximize classification accuracy, but rather to identify consistent and interpretable predictors of crash severity. This emphasis on feature importance allows us to extract meaningful patterns from the models even when predictive performance is modest, providing a more robust basis for understanding the underlying risk factors.

Summary of Results

Taken together, the results across all crash types and modeling configurations reveal several consistent insights into the factors most associated with crash severity. Behavioral factors, particularly drug involvement, emerge as the most influential predictors of fatal outcomes across all road user types. This pattern is especially pronounced in regularized models, where alcohol involvement and crash speed consistently appear as a top predictor.

Across degree of injury and injury vs. non-injury outcomes, the influence of environmental and contextual features becomes more apparent. Variables such as road surface conditions (especially when unknown or icy), lighting conditions (Total Watts), and crash timing (Crash Hour) rank highly, particularly in Random Forest models capable of capturing non-linear interactions. Notably, road surface unknown appears prominently in linear models, raising the possibility that data gaps or ambiguous reporting may be correlated with crash severity.

Infrastructure-only models, while generally less predictive than those using the full feature set, still highlight important built environment characteristics. Features such as the presence of bike lanes, traffic calming devices, and speed limits frequently emerge as top predictors of injury outcomes. For fatality predictions, traffic control infrastructure appears to play a meaningful role, suggesting that strategically placed interventions may help mitigate the most severe outcomes. The variation in feature importance across model types further underscores the value of using multiple modeling approaches to gain complementary perspectives on risk factors.

Overall, the findings support an understanding of crash severity where behavioral interventions, environmental awareness, and infrastructure improvements all have roles to play. By disaggregating analyses by crash type and outcome severity, these results offer a more granular view of the conditions under which crashes result in serious harm. These insights can directly inform targeted safety strategies for motorists, bicyclists, and pedestrians alike.

Conclusions

This study examined the relationship between neighborhood greenways, traffic calming features, and crash severity for cyclists and pedestrians in Portland. Through intersection-level analysis and classification modeling, we found that crashes involving vulnerable road users are concentrated in high-speed, high-volume corridors that lack dedicated infrastructure, while intersections with calming features, greenways, and lower posted speed limits see fewer and less severe crashes.

Our modeling results reinforced these findings, showing that while behavioral factors such as drug and alcohol involvement remain the strongest predictors of fatal outcomes, infrastructure-related features including speed limits, lighting, bike lanes, and traffic calming devices consistently emerged as important contributors to injury severity, particularly for pedestrian crashes. These results suggest that infrastructure alone cannot eliminate the risk of severe crashes but can play a meaningful role in mitigating their likelihood and impact, especially when paired with enforcement and education strategies that address behavioral risk factors.

These insights align with Portland’s Vision Zero goals and point to specific areas for investment. Expanding traffic calming measures in high-volume corridors, improving lighting in pedestrian-heavy areas, and closing infrastructure gaps in underserved neighborhoods may deliver measurable safety benefits. Additionally, better integration of behavioral and environmental interventions could further reduce risk for the city’s most vulnerable road users.

At the same time, this analysis is limited by underreporting of minor crashes, the lag in infrastructure adoption, and the disruptions caused by the COVID-19 pandemic. Future research could build on this work by incorporating more granular exposure data, examining post-installation trends over longer time horizons, and exploring neighborhood-level equity impacts to ensure that infrastructure improvements are distributed fairly across the city.

By combining descriptive and modeling approaches, this study offers a clearer view of how street design and context relate to crash severity. These findings provide evidence to support targeted infrastructure investments and reinforce the importance of pairing physical improvements with broader safety strategies to protect people walking, biking, and rolling in Portland.

References

Lusk, Anne C., Peter G. Furth, Patrick Morency, Luis F. Miranda-Moreno, Walter C. Willett, and Jack T. Dennerlein. 2011. “Risk of Injury for Bicycling on Cycle Tracks Versus in the Street.” Injury Prevention 17 (2): 131–35. https://doi.org/10.1136/ip.2010.028696.

Oregon Department of Transportation. 2022. “Bicycle and Pedestrian Infrastructure Evaluation Report.” https://www.oregon.gov/odot/Programs/Documents/BikePed-Infrastructure-Evaluation.pdf.

Portland Bureau of Transportation. 2023a. “Portland’s Neighborhood Greenways.” https://www.portland.gov/transportation/greenways.

———. 2023b. “Vision Zero Action Plan: 2023 Update.” https://www.portland.gov/sites/default/files/2023/vision-zero-action-plan-update-2023.pdf.

———. 2023c. “Vision Zero Action Plan: 2023 Update.” https://www.portland.gov/sites/default/files/2023/vision-zero-action-plan-update-2023.pdf.

———. 2023d. “Vision Zero: Crash Data and Trends.” https://www.portland.gov/transportation/vision-zero/crash-data.

Reynolds, Conor C., M. Anne Harris, Kay Teschke, Peter A. Cripton, and Meghan Winters. 2009. “The Impact of Transportation Infrastructure on Bicycling Injuries and Crashes: A Review of the Literature.” Environmental Health 8 (47): 1–19. https://doi.org/10.1186/1476-069X-8-47.

Walk Score. 2025. “Portland, OR Walk Score.” https://www.walkscore.com/OR/Portland.

Appendix

Below is the full results table aggregated from every combination of model used: