Panodata Community

Luftdatenpumpe

About

»Luftdatenpumpe« is a toolkit for efficiently processing air particulate data from open data sources.

It can filter by station-id, sensor-id and sensor-type, and apply reverse geocoding in the data transformation steps. After processing, the software will store measurement data into timeseries and RDBMS databases (InfluxDB and PostGIS) and can optionally publish data to MQTT or just output as JSON.

Data coverage

Luftdatenpumpe currently processes live and historical data from luftdaten.info, the IRCELINE network and the OpenAQ platform.

Description

InfluxDB is used for storing timeseries data while PostGIS is used for storing metadata information (station list). On top of these databases, rich and dense Grafana data dashboards combine advanced GIS queries with powerful timeseries queries.

The software framework provided by Luftdatenpumpe is sustainable enough to ingest and process raw data from more and different data sources in order to provide reliable and harmonised air quality data on your fingertips and embedded into meaningful visualizations.

As a platform, the software provides various software components as a basis for own development projects. Due to the variety of interfaces to different data sources, data sinks and output formats, it offers extensive integration possibilities with respect to data acquisition and processing of environmental information and beyond.

The software is mainly developed in the Python programming language and is available from GitHub [1] and PyPI [2].

[1] GitHub - panodata/luftdatenpumpe: Process live and historical data from luftdaten.info, IRCELINE and OpenAQ. Filter by station-id, sensor-id and sensor-type, apply reverse geocoding, store into timeseries and RDBMS databases, publish to MQTT, output as JSON or visualize in Grafana.

[2] luftdatenpumpe · PyPI

Features

  1. Luftdatenpumpe acquires the measurement readings either from the livedata API of luftdaten.info or from its archived CSV files published to archive.luftdaten.info. To minimize impact on the upstream servers, all data gets reasonably cached.
  2. While iterating the readings, it optionally filters on station-id, sensor-id or sensor-type and restrains information processing to the corresponding stations and sensors.
  3. Then, each station’s location information gets enhanced by
    • attaching its geospatial position as a Geohash.
    • attaching a synthetic real-world address resolved using the reverse geocoding service Nominatim by OpenStreetMap.
  4. Information about stations can be
    • displayed on STDOUT or STDERR in JSON format.
    • filtered and transformed interactively through jq, the swiss army knife of JSON manipulation.
    • stored into RDBMS databases like PostgreSQL using the fine dataset package. Being built on top of SQLAlchemy, this supports all major databases.
    • queried using advanced geospatial features when running PostGIS, please follow up reading the Luftdatenpumpe PostGIS tutorial.
  5. Measurement readings can be
    • displayed on STDOUT or STDERR in JSON format, which allows for piping into jq again.
    • forwarded to MQTT.
    • stored to InfluxDB and then
    • displayed in Grafana.