Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://swc-slack-invite.herokuapp.com/)

## The Lesson
Check out https://carpentries-incubator.github.io/geospatial-python/ to view the lesson as it currently stands. Episodes 1-7 and Episode 12 are currently complete. Epsiodes 1-7 have been taught previously in workshop settings, read more about that experience and feedback [here](https://carpentries.org/blog/2020/03/teaching-a-new-geospatial-python-lesson/). Other episodes still need to be fleshed out and we welcome contributions!
Check out https://carpentries-incubator.github.io/geospatial-python/ to view the lesson as it currently stands. Episodes 1-7 and Episode 12 are currently complete. Episodes 1-7 have been taught previously in workshop settings, read more about that experience and feedback [here](https://carpentries.org/blog/2020/03/teaching-a-new-geospatial-python-lesson/). Other episodes still need to be fleshed out and we welcome contributions!

## Teaching this lesson?
Do you want to Introduction to Geospatial Raster and Vector Data with Python? This material is open-source and freely available.
Expand All @@ -20,7 +20,7 @@ Please see the current list of [issues](https://github.com/carpentries-incubator
repository. For making your contribution, we use the GitHub flow, which is
nicely explained in the chapter [Contributing to a Project](http://git-scm.com/book/en/v2/GitHub-Contributing-to-a-Project) in Pro Git
by Scott Chacon.
Look for the tag ![good_first_issue](https://img.shields.io/badge/-good%20first%20issue-gold.svg). This indicates that the mantainers will welcome a pull request fixing this issue.
Look for the tag ![good_first_issue](https://img.shields.io/badge/-good%20first%20issue-gold.svg). This indicates that the maintainers will welcome a pull request fixing this issue.

We are looking for more active developers of this lesson. Edits to existing lesson episodes or additions to new episodes are welcome. If you are interested in contributing a new episode, you can either start by creating a new Github Issue to start discussion or dive in and submit a new Pull Request.

Expand Down
4 changes: 2 additions & 2 deletions episodes/02-intro-vector-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ are represented by which vector type.

State boundaries are shown as polygons. The Fisher Tower location is
represented by a purple point. There are no line features shown.
Note, that at a different scale the Fischer Tower coudl also have been represented as a polygon.
Note, that at a different scale the Fischer Tower could also have been represented as a polygon.
Keep in mind that the purpose for which the dataset is created and aimed to be used for determines
which vector type it uses.
::::
Expand Down Expand Up @@ -92,7 +92,7 @@ their features to real-world locations.
Like raster data, vector data can also come in many different formats. For this
workshop, we will use the GeoPackage format. GeoPackage is developed by the [Open Geospatial Consortium](https://www.ogc.org/) and is *is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information* (source: [https://www.geopackage.org/](https://www.geopackage.org/)). A GeoPackage file, with extension **.gpkg**, is a single file that contains the geometries of features, their attributes and information about the coordinate reference system (CRS) used.

Another vector format that you will probably come accross quite often is a Shapefile. Although we will not be using this format in this lesson we do believe it is useful to understand how the Shapefile format works. Shapefile is a multi-file format, with each shapefile consisting of multiple files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.
Another vector format that you will probably come across quite often is a Shapefile. Although we will not be using this format in this lesson we do believe it is useful to understand how the Shapefile format works. Shapefile is a multi-file format, with each shapefile consisting of multiple files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.

- The `.shp` file stores the feature geometry itself
- `.shx` is a positional index of the feature geometry to allow quickly searching forwards and backwards the geographic coordinates of each vertex in the vector
Expand Down
4 changes: 2 additions & 2 deletions episodes/03-crs.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ degrees) and defines the starting point (i.e. where is [0,0]?) so the angles
reference a meaningful spot on the earth. Common global datums are WGS84 and
NAD83. Datums can also be local - fit to a particular area of the globe, but
ill-fitting outside the area of intended use. For instance local cadastre, land registry and mapping agencies require a high quality for their datasets, which can be obtained using a local system. In this lesson, we will use the
[WGS84 datum](https://www.linz.govt.nz/data/geodetic-system/datums-projections-and-heights/geodetic-datums/world-geodetic-system-1984-wgs84). The datum is often also refered to as the Geographical Coordinate System.
[WGS84 datum](https://www.linz.govt.nz/data/geodetic-system/datums-projections-and-heights/geodetic-datums/world-geodetic-system-1984-wgs84). The datum is often also referred to as the Geographical Coordinate System.

* **Projection:** A mathematical transformation of the angular measurements on a
round earth to a flat surface (i.e. paper or a computer screen). The units
Expand Down Expand Up @@ -91,7 +91,7 @@ stem of the fruit. What other parameters could be included in this analogy?

## Which projection should I use?

A well know projection is the [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection) introduced by the Flemisch cartographer Gerardus Mercator in the 16th Century. This is a so-called cilindrical projection, meaning that a virtual cilinder is placed around the globe to flatten it. This type of projections are relatively accurate near to the equator, but towards the poles blows things up (more info on cylindrical projections [here](https://gisgeography.com/cylindrical-projection/). The main advantage of the Mercator projection is that it is very suitable for navigation purposes since it always shows North as *up* and South and as *down* - in the 17th century this projection was essential for sailors to navigate the oceans.
A well know projection is the [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection) introduced by the Flemisch cartographer Gerardus Mercator in the 16th Century. This is a so-called cylindrical projection, meaning that a virtual cylinder is placed around the globe to flatten it. This type of projections are relatively accurate near to the equator, but towards the poles blows things up (more info on cylindrical projections [here](https://gisgeography.com/cylindrical-projection/). The main advantage of the Mercator projection is that it is very suitable for navigation purposes since it always shows North as *up* and South and as *down* - in the 17th century this projection was essential for sailors to navigate the oceans.

To decide if a projection is right for your data, answer these questions:

Expand Down
2 changes: 1 addition & 1 deletion episodes/05-access-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@ in the NASA Common Metadata Repository (CMR) and it can be accessed from the STA
`https://cmr.earthdata.nasa.gov/stac/LPCLOUD`.

- Using `pystac_client`, search for all assets of the Landsat 8 collection (`HLSL30.v2.0`) from February to March
2021, intersecting the point with longitude/latitute coordinates (-73.97, 40.78) deg.
2021, intersecting the point with longitude/latitude coordinates (-73.97, 40.78) deg.
- Visualize an item's thumbnail (asset key `browse`).

:::: solution
Expand Down
10 changes: 5 additions & 5 deletions episodes/06-raster-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ This can give us a quick view of the values of our array, but only at the corner
rhodes_red.plot()
```

![Raster plot with rioxarray](fig/E06/rhodes_red_B04.png){alt="raster plot with defualt setting"}
![Raster plot with rioxarray](fig/E06/rhodes_red_B04.png){alt="raster plot with default setting"}

Notice that `rioxarray` helpfully allows us to plot this raster with spatial coordinates on the x and y axis (this is not the default in many cases with other functions or libraries). Nice plot! However, it probably took a while for it to load therefore it would make sense to resample it.

Expand Down Expand Up @@ -189,9 +189,9 @@ Lets plot this one.
```python
rhodes_red_80.plot()
```
![Raster plot 80 x 80 meter resolution with rioxarray](fig/E06/rhodes_red_80_B04.png){alt="raster plot with defualt setting"}
![Raster plot 80 x 80 meter resolution with rioxarray](fig/E06/rhodes_red_80_B04.png){alt="raster plot with default setting"}

This plot shows the satellite measurement of the band `red` for Rhodes before the wildfire. According to the [Sentinel-2 documentaion](https://sentiwiki.copernicus.eu/web/s2-mission#S2-Mission-MSI-Instrument), this is a band with the central wavelength of 665nm. It has a spatial resolution of 10m. Note that the `band=1` in the image title refers to the ordering of all the bands in the `DataArray`, not the Sentinel-2 band number `04` that we saw in the pystac search results.
This plot shows the satellite measurement of the band `red` for Rhodes before the wildfire. According to the [Sentinel-2 documentation](https://sentiwiki.copernicus.eu/web/s2-mission#S2-Mission-MSI-Instrument), this is a band with the central wavelength of 665nm. It has a spatial resolution of 10m. Note that the `band=1` in the image title refers to the ordering of all the bands in the `DataArray`, not the Sentinel-2 band number `04` that we saw in the pystac search results.


:::callout
Expand All @@ -205,7 +205,7 @@ rhodes_red_80.plot(robust=True)

Now the color limit is set in a way fitting most of the values in the image. We have a better view of the ground pixels.

For a customized displaying range, you can also manually specifying the keywords `vmin` and `vmax`. For example ploting between `100` and `2000`:
For a customized displaying range, you can also manually specifying the keywords `vmin` and `vmax`. For example plotting between `100` and `2000`:

```python
rhodes_red_80.plot(vmin=100, vmax=2000)
Expand Down Expand Up @@ -382,7 +382,7 @@ You may notice that `rhodes_red_80.quantile` and `numpy.percentile` didn't requi
:::

## Dealing with Missing Data
So far, we have visualized a band of a Sentinel-2 scene and calculated its statistics. However, as you can see on the image it also contains an artificial band to the top left where data is missing. In order to calculate meaningfull statistics, we need to take missing data into account. Raster data often has a "no data value" associated with it and for raster datasets read in by `rioxarray`. This value is referred to as `nodata`. This is a value assigned to pixels where data is missing or no data were collected. There can be different cases that cause missing data, and it's common for other values in a raster to represent different cases. The most common example is missing data at the edges of rasters.
So far, we have visualized a band of a Sentinel-2 scene and calculated its statistics. However, as you can see on the image it also contains an artificial band to the top left where data is missing. In order to calculate meaningful statistics, we need to take missing data into account. Raster data often has a "no data value" associated with it and for raster datasets read in by `rioxarray`. This value is referred to as `nodata`. This is a value assigned to pixels where data is missing or no data were collected. There can be different cases that cause missing data, and it's common for other values in a raster to represent different cases. The most common example is missing data at the edges of rasters.

By default the shape of a raster is always rectangular. So if we have a dataset that has a shape that isn't rectangular, like most satellite images, some pixels at the edge of the raster will have no data values. This often happens when the data were collected by a sensor which only flew over some part of a defined region and is also almost by default because of the fact that the earth is not flat and that we work with geographic and projected coordinate system.

Expand Down
16 changes: 8 additions & 8 deletions episodes/07-vector-data-in-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ questions:

In the preceding episodes, we have prepared, selected and downloaded raster data from before and after the wildfire event in the summer of 2023 on the Greek island of Rhodes. To evaluate the impact of this wildfire on the vital infrastructure and built-up areas we are going to create a subset of vector data representing these assets. In this episode you will learn how to extract vector data with specific characteristics like the type of attributes or their locations. The dataset that we will generate in this episode can lateron be confronted with scorched areas which we determine by analyzing the satellite images [Episode 9: Raster Calculations in Python](09-raster-calculations.md).

We'll be examining vector datasets that represent the valuable assests of Rhodes. As mentioned in [Episode 2: Introduction to Vector Data](02-intro-vector-data.md), vector data uses points, lines, and polygons to depict specific features on the Earth's surface. These geographic elements can have one or more attributes, like 'name' and 'population' for a city. In this episode we'll be using two open data sources: the Database of Global Administrative Areas (GADM) dataset to generate a polygon for the island of Rhodes and and Open Street Map data for the vital infrastructure and valuable assets.
We'll be examining vector datasets that represent the valuable assets of Rhodes. As mentioned in [Episode 2: Introduction to Vector Data](02-intro-vector-data.md), vector data uses points, lines, and polygons to depict specific features on the Earth's surface. These geographic elements can have one or more attributes, like 'name' and 'population' for a city. In this episode we'll be using two open data sources: the Database of Global Administrative Areas (GADM) dataset to generate a polygon for the island of Rhodes and and Open Street Map data for the vital infrastructure and valuable assets.

To handle the vector data in python we use the package [`geopandas`](https://geopandas.org/en/stable/). This package allows us to open, manipulate, and write vector dataset through python.

Expand Down Expand Up @@ -81,7 +81,7 @@ GID_3 GID_0 COUNTRY GID_1 NAME_1 \
[326 rows x 17 columns]
```

The data are read into the variable fields as a `GeoDataFrame`. This is an extened data format of `pandas.DataFrame`, with an extra column `geometry`. To explore the dataframe you can call this variable just like a `pandas dataframe` by using functions like `.shape`, `.head` and `.tail` etc.
The data are read into the variable fields as a `GeoDataFrame`. This is an extended data format of `pandas.DataFrame`, with an extra column `geometry`. To explore the dataframe you can call this variable just like a `pandas dataframe` by using functions like `.shape`, `.head` and `.tail` etc.

To visualize the polygons we can use the [`plot()`](https://geopandas.org/en/stable/docs/user_guide/mapping.html) function to the `GeoDataFrame` we have loaded `gdf_greece`:

Expand Down Expand Up @@ -127,7 +127,7 @@ gdf_rhodes.to_file('rhodes.gpkg')
Now that we have the boundary of our study area, we will make use this to select the main roads in our study area. We will make the following processing steps:

1. Select roads of study area
2. Select key infrastruture: 'primary', 'secondary', 'tertiary'
2. Select key infrastructure: 'primary', 'secondary', 'tertiary'
3. Create a 100m buffer around the rounds. This buffer will be regarded as the infrastructure region. (note that this buffer is arbitrary and can be changed afterwards if you want!)

#### Step 1: Select roads of study area
Expand Down Expand Up @@ -164,9 +164,9 @@ gdf_roads.explore()

![](fig/E07/rhodes_highways.png){alt="rhodes_highways"}

#### Step 2: Select key infrastruture
#### Step 2: Select key infrastructure

As you will find out while exploring the roads dataset, information about the type of roads is stored in the `fclass` column. To get an overview of the different values that are present in the collumn `fclass` , we can use the [`unique()`](https://pandas.pydata.org/docs/reference/api/pandas.unique.html) function from pandas:
As you will find out while exploring the roads dataset, information about the type of roads is stored in the `fclass` column. To get an overview of the different values that are present in the column `fclass` , we can use the [`unique()`](https://pandas.pydata.org/docs/reference/api/pandas.unique.html) function from pandas:

```python
gdf_roads['fclass'].unique()
Expand Down Expand Up @@ -206,7 +206,7 @@ key_infra.plot()

Now that we selected the key infrastructure, we want to create a 100m buffer around them. This buffer will be regarded as the infrastructure region.

As you might have notice, the numbers on the x and y axis of our plots represent Lon Lat coordinates, meaning that the data is not yet projected. The current data has a geographic coordinate system with measures in degrees but not meter. Creating a buffer of 100 meters is not possible. Therfore, in order to create a 100m buffer, we first need to project our data. In our case we decided to project the data as
As you might have notice, the numbers on the x and y axis of our plots represent Lon Lat coordinates, meaning that the data is not yet projected. The current data has a geographic coordinate system with measures in degrees but not meter. Creating a buffer of 100 meters is not possible. Therefore, in order to create a 100m buffer, we first need to project our data. In our case we decided to project the data as
WGS 84 / UTM zone 31N, with EPSG code 32631 ([see chapter 03 for more information about the CRS and EPSG codes](/episodes/03-crs.md).

To project our data we use [.to_crs](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_crs.html). We first define a variable with the EPSG value (in our case 32631), which we then us in the to_crs function.
Expand Down Expand Up @@ -300,9 +300,9 @@ key_infra_buffer_200 = buffer_crs(key_infra, 200)

Now that we have a buffered dataset for the key infrastructure of Rhodes, our next step is to create a dataset with all the built-up areas. To do so we will use the land use data from OSM, which we prepared for you in the file `data/osm_landuse.gpkg`. This file includes the land use data for the entire Greece. We assume the built-up regions to be the union of three types of land use: "commercial", "industrial", and "residential".

Note that for the simplicity of this course, we limit the built-up regions to these three types of land use. In reality, the built-up regions can be more complex also there is definately more high quality (e.g. local government).
Note that for the simplicity of this course, we limit the built-up regions to these three types of land use. In reality, the built-up regions can be more complex also there is definitely more high quality (e.g. local government).

Now it will be up to you to create a dataset with valueable assets. You should be able to complete this task by yourself with the knowledge you have gained from the previous steps and links to the documentation we provided.
Now it will be up to you to create a dataset with valuable assets. You should be able to complete this task by yourself with the knowledge you have gained from the previous steps and links to the documentation we provided.

:::challenge
## Exercise: Get the built-up regions
Expand Down
2 changes: 1 addition & 1 deletion episodes/08-crop-raster-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ gdf_rhodes = gdf_rhodes.to_crs(red.rio.crs)

red_clip = red.rio.clip(gdf_rhodes["geometry"])

# Step 5 - assing nan values to no data
# Step 5 - assign nan values to no data

red_clip_nan = red_clip.where(red_clip!=red_clip.rio.nodata)

Expand Down
2 changes: 1 addition & 1 deletion episodes/09-raster-calculations.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ $$ NDVI = \frac{NIR - red}{NIR + red} $$

$$ NDWI = \frac{green - NIR}{green + NIR} $$

* A custom index derived from two of the **short-wave infrared (SWIR)** bands (with wavelenght ~1600 nm and ~2200 nm,
* A custom index derived from two of the **short-wave infrared (SWIR)** bands (with wavelength ~1600 nm and ~2200 nm,
respectively):

$$ INDEX = \frac{SWIR_{16} - SWIR_{22}}{SWIR_{16} + SWIR_{22}}$$
Expand Down
Loading