This document outlines the complete data processing pipeline for generating jurisdiction-level EV charging infrastructure maps. The pipeline processes utility circuit line data, federal funding zones, environmental indicators, and demographic data to create priority and feasibility pixel grids.
Update Frequency: Utility circuit line data should be updated twice annually. Other datasets are updated as needed based on availability from source agencies.
- Data Acquisition - Download utility circuit line data from each provider
- Data Cleaning - Standardize columns, convert units, add utility identifiers
- Concatenation - Combine all utility lines into single dataset
- Pixelation - Convert utility lines to 100m x 100m pixel grid
- Attribute Joining - Add demographic, environmental, and funding attributes
- Output Generation - Create jurisdiction-specific priority and feasibility files
Source: PG&E GRIP Portal
Two acquisition methods available:
- Navigate to the GRIP portal
- In the layer list, expand ICA > ICA Results
- Click options menu (three dots) for "ICA, Load Capacity (kW)"
- Select Export > GeoJSON
Note: This method may encounter server timeout issues with large datasets.
Pull data directly from the ArcGIS Feature Server:
import requests
import geopandas as gpd
base_url = "https://services2.arcgis.com/mJaJSax0KPHoCNB6/arcgis/rest/services/DRPComplianceRelProd/FeatureServer/3/query"
params = {
"where": "1=1",
"outFields": "*",
"f": "geojson",
"resultOffset": 0,
"resultRecordCount": 1000,
}
features = []
while True:
print(f"Fetching offset {params['resultOffset']}")
response = requests.get(base_url, params=params)
data = response.json()
if "features" not in data or not data["features"]:
break
features.extend(data["features"])
params["resultOffset"] += params["resultRecordCount"]
pge = gpd.GeoDataFrame.from_features(features)Data Processing:
# Retain only necessary columns
pge = pge[['LoadCapacity_kW', 'geometry']]
# Add utility identifier
pge['Utility'] = 'pge'
# Set CRS and save
pge = gpd.GeoDataFrame(pge, geometry='geometry')
pge.set_crs(epsg=4326, inplace=True)
pge.to_file('pge_load.geojson', driver='GeoJSON')Note
Downloading manually using website times out.
Downloading using python script above sometimes hangs part way making it hard to script this automatically. Currently, looks like this dataset has 1289568 records.
Source: SDG&E ICM API Explorer
Data Acquisition:
- Access the ICM API Explorer (account creation may be required)
- Navigate to Load Capacity Grids map
- Download as GeoJSON or Shapefile
Data Processing:
import geopandas as gpd
# Load data
sdge = gpd.read_file("path/to/sdge.geojson")
# Verify load columns are identical
sdge['equal'] = sdge['ICAWOF_UNILOAD'] == sdge['ICAWNOF_UNILOAD']
sdge.loc[sdge['equal'] == False] # Should return empty table
# Convert MW to kW
sdge['load_kw'] = sdge['ICAWOF_UNILOAD'] * 1000
# Retain only necessary columns
sdge = sdge[['load_kw', 'geometry']]
# Add utility identifier
sdge['Utility'] = 'sdge'
# Set CRS and save
sdge = gpd.GeoDataFrame(sdge, geometry='geometry')
sdge.set_crs(epsg=4326, inplace=True)
sdge.to_file('sdge_load.geojson', driver='GeoJSON')Note
No login was required for me to download. Attempt to download GeoJSON fails to execute. Was able to download Shapefile. Shapefile has shortened field names so the script needs to be modified to deal with that. ICAWOF_UNILOAD -> ICAWOF_UNI, ICAWNOF_UNILOAD -> ICAWNOF_UN Shapefile is in PseudoMercator so the set_crs command instead needs to be to_crs.
Source: LADWP Power GIS Portal
Data Acquisition:
- Click "Download the 34.5 KV data" link
- Unzip downloaded file to extract .kmz file
- Convert .kmz to .gdb using ArcGIS "KMZ to Layer" tool
Data Processing:
import geopandas as gpd
import pandas as pd
from bs4 import BeautifulSoup
# Load geodatabase
ladwp = gpd.read_file("path/to/ladwp.gdb")
# Extract popup information
def extract_popup_info(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
data = {}
table = soup.find_all('table')[1]
for row in table.find_all('tr'):
cols = row.find_all('td')
if len(cols) == 2:
key = cols[0].get_text(strip=True)
value = cols[1].get_text(strip=True)
data[key] = value
return data
popup_info_df = ladwp['PopupInfo'].apply(extract_popup_info)
popup_info_expanded = pd.json_normalize(popup_info_df)
gdf_expanded = ladwp.drop(columns=['PopupInfo']).join(popup_info_expanded)
# Extract minimum capacity value from range
gdf_expanded['min_value'] = gdf_expanded['CAPACITY_RANGE_KW'].str.extract(r'^\s*(\d+)')
# Retain only necessary columns
ladwp = gdf_expanded[['min_value', 'geometry']]
# Add utility identifier
ladwp['Utility'] = 'ladwp'
# Set CRS and save
ladwp = gpd.GeoDataFrame(ladwp, geometry='geometry')
ladwp.set_crs(epsg=4326, inplace=True)
ladwp.to_file('ladwp_load.geojson', driver='GeoJSON')Note
34.5kV zip file on website is corrupted... File was updated after reaching out to LADWP and file was retrieved...
Source: SCE DRP Portal
Data Acquisition:
- Click "ESRI API" tab
- Navigate to "ICA Layer" > "ICA - Circuit Segments"
- Download as GeoJSON or Shapefile
- Also download "ICA - Circuit Segments, Non-3 Phase" if available
Note: SCE provides separate files for 3-phase and non-3-phase circuits. Verify whether these datasets contain unique data before concatenating. If datasets are identical, only one is needed.
Data Processing:
import geopandas as gpd
# Load data
socaled = gpd.read_file("path/to/socaled.geojson")
# Convert MW to kW (column is stored as string)
socaled['load_kw'] = (socaled['ica_overall_load'].astype('float')) * 1000
# Retain only necessary columns
socaled = socaled[['load_kw', 'geometry']]
# Add utility identifier
socaled['Utility'] = 'socaled'
# Set CRS and save
socaled = gpd.GeoDataFrame(socaled, geometry='geometry')
socaled.set_crs(epsg=4326, inplace=True)
socaled.to_file('socaled_load.geojson', driver='GeoJSON')Combine all processed utility datasets into a single file:
import pandas as pd
import geopandas as gpd
# Load all utility files
pge = gpd.read_file('pge_load.geojson')
ladwp = gpd.read_file('ladwp_load.geojson')
sdge = gpd.read_file('sdge_load.geojson')
socaled = gpd.read_file('socaled_load.geojson')
# Concatenate
utility_lines = pd.concat([pge, ladwp, sdge, socaled], ignore_index=True)
# Set CRS and save
utility_lines = gpd.GeoDataFrame(utility_lines, geometry='geometry')
utility_lines.set_crs(epsg=4326, inplace=True)
utility_lines.to_file('utility_lines.geojson', driver='GeoJSON')Output: Save utility_lines.geojson to jurisdiction_script/data/other/
Convert utility circuit lines into a 100m x 100m pixel grid covering areas within 75 meters of utility infrastructure.
Command:
cd jurisdiction_script
python create_utility_pixels.py \
-i data/other/utility_lines.geojson \
-o data/grids/utilities_pixels.json \
-b 75Process:
- Creates 100m x 100m grid covering California (~98 million grid points)
- Buffers utility lines by 75 meters
- Clips grid to areas within utility buffer (~2 million pixels)
- Converts point centroids to square polygons
- Saves output to
data/grids/utilities_pixels.json
Performance Requirements:
- Memory: 16-32GB RAM
- Processing Time: 45-90 minutes
- Output Size: ~400-500MB
Output: Save utilities_pixels.json to jurisdiction_script/data/grids/
Configuration files are located in jurisdiction_script/config/ as YAML files.
Update the following paths:
- Feasibility pixels: Update to reference new
utilities_pixels.json - Utility lines: Update to reference new
utility_lines.geojson
Execute the main processing script:
cd jurisdiction_script
python jscript.py config_fileReplace config_file with the appropriate configuration file name (without .yaml extension).
Example:
python jscript.py alameda_berkeleyOutput: Priority and feasibility JSON files will be generated in jurisdiction_script/out/
[jurisdiction]_priority.json[jurisdiction]_feasibility.json
| Data Type | Source |
|---|---|
| California County Boundaries | US Census TIGER/Line |
| California Place Boundaries | US Census TIGER/Line Places |
| Utility | Source |
|---|---|
| Pacific Gas & Electric (PG&E) | PG&E DRP Integration Capacity Map |
| Southern California Edison (SCE) | SCE DRP Portal |
| San Diego Gas & Electric (SDG&E) | SDG&E ICM API Explorer |
| Los Angeles Dept. of Water & Power (LADWP) | LADWP Power GIS Portal |
| Data Type | Source |
|---|---|
| CalEnviroScreen 4.0 | OEHHA CalEnviroScreen |
| EJScreen | Harvard Dataverse |
| CEJST | Harvard Dataverse |
| Data Type | Source |
|---|---|
| Non-White Population (2021 5-yr ACS) | Census Data Portal |
| Disability Characteristics (2021 5-yr ACS) | Census Data Portal |
| Commute Time (2021 5-yr ACS) | Census Data Portal |
Current Implementation:
- EJScreen and CEJST indicators use percentile rankings across US census tracts
- CalEnviroScreen provides intra-state (California-only) percentile comparisons
- This provides both interstate and intrastate comparisons for California
Future Considerations: When expanding to states outside California:
- CalEnviroScreen is California-specific and unavailable for other states
- Consider using EJScreen's intrastate tract comparison option
- This would maintain both inter- and intra-state comparison capabilities using CEJST (interstate) and EJScreen (intrastate)
Common Issues:
-
API URL Changes: Utility provider API endpoints may change. Check source portals for updated URLs.
-
Memory Issues: Pixelation process requires significant RAM. Close other applications or use a machine with more memory.
-
Timeout Errors: When downloading large datasets, use API-based methods rather than direct downloads.
-
Missing Dependencies: Ensure all required Python packages are installed:
conda install -c conda-forge geopandas numpy pandas scipy matplotlib pyyaml fiona shapely beautifulsoup4
-
CRS Mismatches: All output files should use EPSG:4326 (WGS84). Verify CRS after loading external datasets.