How to create a cloud-native fiboa dataset using fiboa CLI
NOTE: This article is outdated, we’ll update it soon! Sorry for any inconvenience caused!
Once when starting with fiboa
The following steps you usually only need to do once:
- Install Python 3.9 or later, GDAL 3.8 or later, tippecanoe, and AWS CLI
- If you have trouble installing Python, GDAL, etc., consider using the conda environment.
Install Anaconda and run the following commands using the env.yml file provided in this repository:
conda env create --file="https://raw.githubusercontent.com/fiboa.github.io/refs/heads/main/data/env.yml"conda activate fiboa
- If you have trouble installing Python, GDAL, etc., consider using the conda environment.
Install Anaconda and run the following commands using the env.yml file provided in this repository:
- Clone the fiboa CLI repository:
git clone https://github.com/fiboa/cliandcd clito switch into the new folder. - Install the dependencies of the fiboa cli repo and the CLI itself:
pip install -e . - Check whether the CLI works:
fiboa --version
Once when creating a new dataset
- Create a converter using fiboa CLI converters, see the template.
- If the dataset is available under an open license, we also want to create a test assuming the converter is named
xx_yy:- Install development dependencies:
pip install -r /path/to/requirements.txt - Create a new folder for the test data:
mkdir tests/data-files/convert/xx_yy - Create a subset of the dataset:
ogr2ogr tests/data-files/convert/xx_yy/input_file.gpkg -limit 100 /path/to/input_file.gpkg - Update the test file
tests/test_convert.pyto include your converter - Run the tests:
pytest. Iterate on 4 and 5 until the tests succeed.
- Install development dependencies:
- Register at Source Cooperative and email
hello@source.coopfor permission to publish in the fiboa organization. - Create a new repository in the fiboa organization, e.g.
@fiboa/xx-yy. You’ll find it athttps://source.coop/fiboa/xx-yy/ - Create a new folder for your data, e.g. data
mkdir data - Create a README file at
data/README.mdand a license file atdata/LICENSE.txtAn example repository with a README etc. can be found here: https://source.coop/fiboa/de-nrw/
Each time you update the dataset
-
Go to the parent folder of the folder that contains your data (e.g.
data) in CLI -
Run the converter, e.g.
xx_yy:fiboa convert xx_yy -o data/xx-yy.parquet -h https://source.coop/fiboa/xx-yy/ --collectionMake sure there are no errors (usually in red) or warnings (usually in yellow) -
Validate the result, e.g.
fiboa validate data/xx-yy.parquet --dataand run the testspytest -
Move the collection.json into a stac folder:
mkdir data/stacandmv data/collection.json data/stac -
Update the README file at
data/README.md -
Create PMTiles file:
ogr2ogr -t_srs EPSG:4326 geo.json data/xx-yy.parquetandtippecanoe -zg --projection=EPSG:4326 -o data/xx-yy.pmtiles -l xx-yy geo.json --drop-densest-as-needed -
Edit the STAC Collection, update the paths, and everything else that you want to customize. Also don’t forget to add a link to the PMTiles file using the corresponding STAC extension.
-
Create a new API key, at:
https://source.coop/repositories/fiboa/xx-yy/manage -
Set the environment variables as follows (Linux/MacOS):
export AWS_ENDPOINT_URL=https://data.source.coop export AWS_ACCESS_KEY_ID=<Access Key ID> export AWS_SECRET_ACCESS_KEY=<Secret Access Key>Note: Windows users may need to change the commands slightly. Use e.g.
$env:AWS_ENDPOINT_URL="[us-west-2](https://data.source.coop)"instead ofexport AWS_ENDPOINT_URL=https://data.source.coop. -
Upload to AWS:
aws s3 sync data s3://fiboa/xx-yy/
If you’ve created and published a STAC Collection for your dataset, make sure to add it to the STAC catalog that combines all datasets into a single STAC catalog It will also publish your dataset to the fiboa data overview page. Create a PR to add your STAC Collection as child link to the following file: https://github.com/fiboa/fiboa.github.io/blob/main/stac/catalog.json See also the README for an alternative.