diff --git a/README.md b/README.md index 9a46610e..99a2dd86 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ [![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://swc-slack-invite.herokuapp.com/) ## The Lesson -Check out https://carpentries-incubator.github.io/geospatial-python/ to view the lesson as it currently stands. Episodes 1-7 and Episode 12 are currently complete. Epsiodes 1-7 have been taught previously in workshop settings, read more about that experience and feedback [here](https://carpentries.org/blog/2020/03/teaching-a-new-geospatial-python-lesson/). Other episodes still need to be fleshed out and we welcome contributions! +Check out https://carpentries-incubator.github.io/geospatial-python/ to view the lesson as it currently stands. Episodes 1-7 and Episode 12 are currently complete. Episodes 1-7 have been taught previously in workshop settings, read more about that experience and feedback [here](https://carpentries.org/blog/2020/03/teaching-a-new-geospatial-python-lesson/). Other episodes still need to be fleshed out and we welcome contributions! ## Teaching this lesson? Do you want to Introduction to Geospatial Raster and Vector Data with Python? This material is open-source and freely available. @@ -20,7 +20,7 @@ Please see the current list of [issues](https://github.com/carpentries-incubator repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter [Contributing to a Project](http://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project) in Pro Git by Scott Chacon. -Look for the tag ![good_first_issue](https://img.shields.io/badge/-good%20first%20issue-gold.svg). This indicates that the mantainers will welcome a pull request fixing this issue. +Look for the tag ![good_first_issue](https://img.shields.io/badge/-good%20first%20issue-gold.svg). This indicates that the maintainers will welcome a pull request fixing this issue. We are looking for more active developers of this lesson. Edits to existing lesson episodes or additions to new episodes are welcome. If you are interested in contributing a new episode, you can either start by creating a new Github Issue to start discussion or dive in and submit a new Pull Request. diff --git a/episodes/02-intro-vector-data.md b/episodes/02-intro-vector-data.md index 4607b72f..9fc14cfc 100644 --- a/episodes/02-intro-vector-data.md +++ b/episodes/02-intro-vector-data.md @@ -60,7 +60,7 @@ are represented by which vector type. State boundaries are shown as polygons. The Fisher Tower location is represented by a purple point. There are no line features shown. -Note, that at a different scale the Fischer Tower coudl also have been represented as a polygon. +Note, that at a different scale the Fischer Tower could also have been represented as a polygon. Keep in mind that the purpose for which the dataset is created and aimed to be used for determines which vector type it uses. :::: @@ -92,7 +92,7 @@ their features to real-world locations. Like raster data, vector data can also come in many different formats. For this workshop, we will use the GeoPackage format. GeoPackage is developed by the [Open Geospatial Consortium](https://www.ogc.org/) and is *is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information* (source: [https://www.geopackage.org/](https://www.geopackage.org/)). A GeoPackage file, with extension **.gpkg**, is a single file that contains the geometries of features, their attributes and information about the coordinate reference system (CRS) used. -Another vector format that you will probably come accross quite often is a Shapefile. Although we will not be using this format in this lesson we do believe it is useful to understand how the Shapefile format works. Shapefile is a multi-file format, with each shapefile consisting of multiple files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files. +Another vector format that you will probably come across quite often is a Shapefile. Although we will not be using this format in this lesson we do believe it is useful to understand how the Shapefile format works. Shapefile is a multi-file format, with each shapefile consisting of multiple files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files. - The `.shp` file stores the feature geometry itself - `.shx` is a positional index of the feature geometry to allow quickly searching forwards and backwards the geographic coordinates of each vertex in the vector diff --git a/episodes/03-crs.md b/episodes/03-crs.md index a37d3731..014c19a0 100644 --- a/episodes/03-crs.md +++ b/episodes/03-crs.md @@ -56,7 +56,7 @@ degrees) and defines the starting point (i.e. where is [0,0]?) so the angles reference a meaningful spot on the earth. Common global datums are WGS84 and NAD83. Datums can also be local - fit to a particular area of the globe, but ill-fitting outside the area of intended use. For instance local cadastre, land registry and mapping agencies require a high quality for their datasets, which can be obtained using a local system. In this lesson, we will use the -[WGS84 datum](https://www.linz.govt.nz/data/geodetic-system/datums-projections-and-heights/geodetic-datums/world-geodetic-system-1984-wgs84). The datum is often also refered to as the Geographical Coordinate System. +[WGS84 datum](https://www.linz.govt.nz/data/geodetic-system/datums-projections-and-heights/geodetic-datums/world-geodetic-system-1984-wgs84). The datum is often also referred to as the Geographical Coordinate System. * **Projection:** A mathematical transformation of the angular measurements on a round earth to a flat surface (i.e. paper or a computer screen). The units @@ -91,7 +91,7 @@ stem of the fruit. What other parameters could be included in this analogy? ## Which projection should I use? -A well know projection is the [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection) introduced by the Flemisch cartographer Gerardus Mercator in the 16th Century. This is a so-called cilindrical projection, meaning that a virtual cilinder is placed around the globe to flatten it. This type of projections are relatively accurate near to the equator, but towards the poles blows things up (more info on cylindrical projections [here](https://gisgeography.com/cylindrical-projection/). The main advantage of the Mercator projection is that it is very suitable for navigation purposes since it always shows North as *up* and South and as *down* - in the 17th century this projection was essential for sailors to navigate the oceans. +A well know projection is the [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection) introduced by the Flemisch cartographer Gerardus Mercator in the 16th Century. This is a so-called cylindrical projection, meaning that a virtual cylinder is placed around the globe to flatten it. This type of projections are relatively accurate near to the equator, but towards the poles blows things up (more info on cylindrical projections [here](https://gisgeography.com/cylindrical-projection/). The main advantage of the Mercator projection is that it is very suitable for navigation purposes since it always shows North as *up* and South and as *down* - in the 17th century this projection was essential for sailors to navigate the oceans. To decide if a projection is right for your data, answer these questions: diff --git a/episodes/05-access-data.md b/episodes/05-access-data.md index a3bcd615..8f8e729e 100644 --- a/episodes/05-access-data.md +++ b/episodes/05-access-data.md @@ -521,7 +521,7 @@ in the NASA Common Metadata Repository (CMR) and it can be accessed from the STA `https://cmr.earthdata.nasa.gov/stac/LPCLOUD`. - Using `pystac_client`, search for all assets of the Landsat 8 collection (`HLSL30.v2.0`) from February to March - 2021, intersecting the point with longitude/latitute coordinates (-73.97, 40.78) deg. + 2021, intersecting the point with longitude/latitude coordinates (-73.97, 40.78) deg. - Visualize an item's thumbnail (asset key `browse`). :::: solution diff --git a/episodes/06-raster-intro.md b/episodes/06-raster-intro.md index a49cf097..b5af4419 100644 --- a/episodes/06-raster-intro.md +++ b/episodes/06-raster-intro.md @@ -159,7 +159,7 @@ This can give us a quick view of the values of our array, but only at the corner rhodes_red.plot() ``` -![Raster plot with rioxarray](fig/E06/rhodes_red_B04.png){alt="raster plot with defualt setting"} +![Raster plot with rioxarray](fig/E06/rhodes_red_B04.png){alt="raster plot with default setting"} Notice that `rioxarray` helpfully allows us to plot this raster with spatial coordinates on the x and y axis (this is not the default in many cases with other functions or libraries). Nice plot! However, it probably took a while for it to load therefore it would make sense to resample it. @@ -189,9 +189,9 @@ Lets plot this one. ```python rhodes_red_80.plot() ``` -![Raster plot 80 x 80 meter resolution with rioxarray](fig/E06/rhodes_red_80_B04.png){alt="raster plot with defualt setting"} +![Raster plot 80 x 80 meter resolution with rioxarray](fig/E06/rhodes_red_80_B04.png){alt="raster plot with default setting"} -This plot shows the satellite measurement of the band `red` for Rhodes before the wildfire. According to the [Sentinel-2 documentaion](https://sentiwiki.copernicus.eu/web/s2-mission#S2-Mission-MSI-Instrument), this is a band with the central wavelength of 665nm. It has a spatial resolution of 10m. Note that the `band=1` in the image title refers to the ordering of all the bands in the `DataArray`, not the Sentinel-2 band number `04` that we saw in the pystac search results. +This plot shows the satellite measurement of the band `red` for Rhodes before the wildfire. According to the [Sentinel-2 documentation](https://sentiwiki.copernicus.eu/web/s2-mission#S2-Mission-MSI-Instrument), this is a band with the central wavelength of 665nm. It has a spatial resolution of 10m. Note that the `band=1` in the image title refers to the ordering of all the bands in the `DataArray`, not the Sentinel-2 band number `04` that we saw in the pystac search results. :::callout @@ -205,7 +205,7 @@ rhodes_red_80.plot(robust=True) Now the color limit is set in a way fitting most of the values in the image. We have a better view of the ground pixels. -For a customized displaying range, you can also manually specifying the keywords `vmin` and `vmax`. For example ploting between `100` and `2000`: +For a customized displaying range, you can also manually specifying the keywords `vmin` and `vmax`. For example plotting between `100` and `2000`: ```python rhodes_red_80.plot(vmin=100, vmax=2000) @@ -382,7 +382,7 @@ You may notice that `rhodes_red_80.quantile` and `numpy.percentile` didn't requi ::: ## Dealing with Missing Data -So far, we have visualized a band of a Sentinel-2 scene and calculated its statistics. However, as you can see on the image it also contains an artificial band to the top left where data is missing. In order to calculate meaningfull statistics, we need to take missing data into account. Raster data often has a "no data value" associated with it and for raster datasets read in by `rioxarray`. This value is referred to as `nodata`. This is a value assigned to pixels where data is missing or no data were collected. There can be different cases that cause missing data, and it's common for other values in a raster to represent different cases. The most common example is missing data at the edges of rasters. +So far, we have visualized a band of a Sentinel-2 scene and calculated its statistics. However, as you can see on the image it also contains an artificial band to the top left where data is missing. In order to calculate meaningful statistics, we need to take missing data into account. Raster data often has a "no data value" associated with it and for raster datasets read in by `rioxarray`. This value is referred to as `nodata`. This is a value assigned to pixels where data is missing or no data were collected. There can be different cases that cause missing data, and it's common for other values in a raster to represent different cases. The most common example is missing data at the edges of rasters. By default the shape of a raster is always rectangular. So if we have a dataset that has a shape that isn't rectangular, like most satellite images, some pixels at the edge of the raster will have no data values. This often happens when the data were collected by a sensor which only flew over some part of a defined region and is also almost by default because of the fact that the earth is not flat and that we work with geographic and projected coordinate system. diff --git a/episodes/07-vector-data-in-python.md b/episodes/07-vector-data-in-python.md index a8839b07..4ea9ef4b 100644 --- a/episodes/07-vector-data-in-python.md +++ b/episodes/07-vector-data-in-python.md @@ -21,7 +21,7 @@ questions: In the preceding episodes, we have prepared, selected and downloaded raster data from before and after the wildfire event in the summer of 2023 on the Greek island of Rhodes. To evaluate the impact of this wildfire on the vital infrastructure and built-up areas we are going to create a subset of vector data representing these assets. In this episode you will learn how to extract vector data with specific characteristics like the type of attributes or their locations. The dataset that we will generate in this episode can lateron be confronted with scorched areas which we determine by analyzing the satellite images [Episode 9: Raster Calculations in Python](09-raster-calculations.md). -We'll be examining vector datasets that represent the valuable assests of Rhodes. As mentioned in [Episode 2: Introduction to Vector Data](02-intro-vector-data.md), vector data uses points, lines, and polygons to depict specific features on the Earth's surface. These geographic elements can have one or more attributes, like 'name' and 'population' for a city. In this episode we'll be using two open data sources: the Database of Global Administrative Areas (GADM) dataset to generate a polygon for the island of Rhodes and and Open Street Map data for the vital infrastructure and valuable assets. +We'll be examining vector datasets that represent the valuable assets of Rhodes. As mentioned in [Episode 2: Introduction to Vector Data](02-intro-vector-data.md), vector data uses points, lines, and polygons to depict specific features on the Earth's surface. These geographic elements can have one or more attributes, like 'name' and 'population' for a city. In this episode we'll be using two open data sources: the Database of Global Administrative Areas (GADM) dataset to generate a polygon for the island of Rhodes and and Open Street Map data for the vital infrastructure and valuable assets. To handle the vector data in python we use the package [`geopandas`](https://geopandas.org/en/stable/). This package allows us to open, manipulate, and write vector dataset through python. @@ -81,7 +81,7 @@ GID_3 GID_0 COUNTRY GID_1 NAME_1 \ [326 rows x 17 columns] ``` -The data are read into the variable fields as a `GeoDataFrame`. This is an extened data format of `pandas.DataFrame`, with an extra column `geometry`. To explore the dataframe you can call this variable just like a `pandas dataframe` by using functions like `.shape`, `.head` and `.tail` etc. +The data are read into the variable fields as a `GeoDataFrame`. This is an extended data format of `pandas.DataFrame`, with an extra column `geometry`. To explore the dataframe you can call this variable just like a `pandas dataframe` by using functions like `.shape`, `.head` and `.tail` etc. To visualize the polygons we can use the [`plot()`](https://geopandas.org/en/stable/docs/user_guide/mapping.html) function to the `GeoDataFrame` we have loaded `gdf_greece`: @@ -127,7 +127,7 @@ gdf_rhodes.to_file('rhodes.gpkg') Now that we have the boundary of our study area, we will make use this to select the main roads in our study area. We will make the following processing steps: 1. Select roads of study area -2. Select key infrastruture: 'primary', 'secondary', 'tertiary' +2. Select key infrastructure: 'primary', 'secondary', 'tertiary' 3. Create a 100m buffer around the rounds. This buffer will be regarded as the infrastructure region. (note that this buffer is arbitrary and can be changed afterwards if you want!) #### Step 1: Select roads of study area @@ -164,9 +164,9 @@ gdf_roads.explore() ![](fig/E07/rhodes_highways.png){alt="rhodes_highways"} -#### Step 2: Select key infrastruture +#### Step 2: Select key infrastructure -As you will find out while exploring the roads dataset, information about the type of roads is stored in the `fclass` column. To get an overview of the different values that are present in the collumn `fclass` , we can use the [`unique()`](https://pandas.pydata.org/docs/reference/api/pandas.unique.html) function from pandas: +As you will find out while exploring the roads dataset, information about the type of roads is stored in the `fclass` column. To get an overview of the different values that are present in the column `fclass` , we can use the [`unique()`](https://pandas.pydata.org/docs/reference/api/pandas.unique.html) function from pandas: ```python gdf_roads['fclass'].unique() @@ -206,7 +206,7 @@ key_infra.plot() Now that we selected the key infrastructure, we want to create a 100m buffer around them. This buffer will be regarded as the infrastructure region. -As you might have notice, the numbers on the x and y axis of our plots represent Lon Lat coordinates, meaning that the data is not yet projected. The current data has a geographic coordinate system with measures in degrees but not meter. Creating a buffer of 100 meters is not possible. Therfore, in order to create a 100m buffer, we first need to project our data. In our case we decided to project the data as +As you might have notice, the numbers on the x and y axis of our plots represent Lon Lat coordinates, meaning that the data is not yet projected. The current data has a geographic coordinate system with measures in degrees but not meter. Creating a buffer of 100 meters is not possible. Therefore, in order to create a 100m buffer, we first need to project our data. In our case we decided to project the data as WGS 84 / UTM zone 31N, with EPSG code 32631 ([see chapter 03 for more information about the CRS and EPSG codes](/episodes/03-crs.md). To project our data we use [.to_crs](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_crs.html). We first define a variable with the EPSG value (in our case 32631), which we then us in the to_crs function. @@ -300,9 +300,9 @@ key_infra_buffer_200 = buffer_crs(key_infra, 200) Now that we have a buffered dataset for the key infrastructure of Rhodes, our next step is to create a dataset with all the built-up areas. To do so we will use the land use data from OSM, which we prepared for you in the file `data/osm_landuse.gpkg`. This file includes the land use data for the entire Greece. We assume the built-up regions to be the union of three types of land use: "commercial", "industrial", and "residential". -Note that for the simplicity of this course, we limit the built-up regions to these three types of land use. In reality, the built-up regions can be more complex also there is definately more high quality (e.g. local government). +Note that for the simplicity of this course, we limit the built-up regions to these three types of land use. In reality, the built-up regions can be more complex also there is definitely more high quality (e.g. local government). -Now it will be up to you to create a dataset with valueable assets. You should be able to complete this task by yourself with the knowledge you have gained from the previous steps and links to the documentation we provided. +Now it will be up to you to create a dataset with valuable assets. You should be able to complete this task by yourself with the knowledge you have gained from the previous steps and links to the documentation we provided. :::challenge ## Exercise: Get the built-up regions diff --git a/episodes/08-crop-raster-data.md b/episodes/08-crop-raster-data.md index 98376a11..222215b7 100644 --- a/episodes/08-crop-raster-data.md +++ b/episodes/08-crop-raster-data.md @@ -194,7 +194,7 @@ gdf_rhodes = gdf_rhodes.to_crs(red.rio.crs) red_clip = red.rio.clip(gdf_rhodes["geometry"]) -# Step 5 - assing nan values to no data +# Step 5 - assign nan values to no data red_clip_nan = red_clip.where(red_clip!=red_clip.rio.nodata) diff --git a/episodes/09-raster-calculations.md b/episodes/09-raster-calculations.md index cebc31f1..0ee8d690 100644 --- a/episodes/09-raster-calculations.md +++ b/episodes/09-raster-calculations.md @@ -33,7 +33,7 @@ $$ NDVI = \frac{NIR - red}{NIR + red} $$ $$ NDWI = \frac{green - NIR}{green + NIR} $$ -* A custom index derived from two of the **short-wave infrared (SWIR)** bands (with wavelenght ~1600 nm and ~2200 nm, +* A custom index derived from two of the **short-wave infrared (SWIR)** bands (with wavelength ~1600 nm and ~2200 nm, respectively): $$ INDEX = \frac{SWIR_{16} - SWIR_{22}}{SWIR_{16} + SWIR_{22}}$$ diff --git a/episodes/10-zonal-statistics.md b/episodes/10-zonal-statistics.md index 924a7b0b..b28ae359 100644 --- a/episodes/10-zonal-statistics.md +++ b/episodes/10-zonal-statistics.md @@ -31,7 +31,7 @@ We have created `assets.gpkg` in Episode "Vector data in Python", which contain import rioxarray burned = rioxarray.open_rasterio('burned.tif') -# Load assests polygons +# Load assets polygons import geopandas as gpd assets = gpd.read_file('assets.gpkg') ``` diff --git a/episodes/12-data-cube.md b/episodes/12-data-cube.md index 71813882..01e540bf 100644 --- a/episodes/12-data-cube.md +++ b/episodes/12-data-cube.md @@ -51,7 +51,7 @@ search = client.search( item_collection = search.item_collection() ``` -[odc-stac](https://odc-stac.readthedocs.io/en/latest/?badge=latest) can ingest directly our search results and create a Xarray DataSet object from the STAC metadata that are present in the `item_collection`. By specifying `groupby='solar_day'`, odc-stac automatically groups and merges images corresponding to the same date of acquisition. `chunks={...}` sets up the resulting data cube using Dask arrays, thus enabling lazy loading (and further operations). `use_overviews=True` tells odc-stac to direcly load lower-resolution versions of the images from the overviews, if these are available in Cloud Optimized Geotiffs (COGs). We set the resolution of the data cube using the `resolution` argument, and define the AoI using the bounding box (`bbox`). We decided to set the resolution to 20 in order to limit the size of the images a bit. +[odc-stac](https://odc-stac.readthedocs.io/en/latest/?badge=latest) can ingest directly our search results and create a Xarray DataSet object from the STAC metadata that are present in the `item_collection`. By specifying `groupby='solar_day'`, odc-stac automatically groups and merges images corresponding to the same date of acquisition. `chunks={...}` sets up the resulting data cube using Dask arrays, thus enabling lazy loading (and further operations). `use_overviews=True` tells odc-stac to directly load lower-resolution versions of the images from the overviews, if these are available in Cloud Optimized Geotiffs (COGs). We set the resolution of the data cube using the `resolution` argument, and define the AoI using the bounding box (`bbox`). We decided to set the resolution to 20 in order to limit the size of the images a bit. ```python import odc.stac