Discussing ambiguities with locations in Italy

@centralinedalbasso reported within Access to InfluxDB or Grafana instance with data from luftdaten.info for Italy, that while operating luftdatenpumpe, they observed ambiguities with reverse geocoded location names.

Within this topic, we might find the issues and maybe mitigate them appropriately.

The issues we are observing are:

  • Some cities are not displayed with a city filter.
  • The address taken from OpenStreetMap contains several errors.
  • For example, the Italian region of “Emilia Romagna” is indicated as
    EMR, Emilia Romagna, Pianura Padana.
1 Like

Improving Luftdatenpumpe regarding ambiguity with locations

In general, we will be happy to accept pull requests to solve any problems users are observing when using Luftdatenpumpe. Collaboration around it would be important to recognize problems and mitigate them. Maybe creating an issue on the GitHub repository is another way to encourage that communication.

It would be cool if we could improve the infrastructure within Luftdatenpumpe in the best way possible if something fails within the data processing and gives the wrong answers in any way. In order to do so, we would have to discuss the current problem you are observing in more detail.

Communication

I just created an appropriate issue on the GitHub repository and will be happy to continue the discussion in one way or another. Thanks already for bringing this up!

We will perform some investigations using things we have worked out for ourselves using jq while renovating Luftdatenpumpe the other day.

Investigation

Dear @centralinedalbasso,

we tried to investigate this a bit and put some comments into the issue at Ambiguity with locations in Italy · Issue #15 · panodata/luftdatenpumpe · GitHub. Maybe you can spot some problems in order that we can better pinpoint the issue together. We will be happy to hear back from you about this.

With kind regards,
Andreas.

Getting into it

First, we tried to investigate potential issues by invoking

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | jq '[ .[].location.address ]'

and receive results for state == "Emilia-Romagna" like

  {
    "country_code": "IT",
    "country": "Italia",
    "state": "Emilia-Romagna",
    "county": "Parma",
    "postcode": "43036",
    "town": "Fidenza",
    "road": "Via Marco Polo",
    "house_number": "14",
    "city": "Fidenza"
  }

which look good.

On the other hand, we receive a couple of results where there might an ambiguity regarding "county": "Reggio nell'Emilia" and "city": "Reggio nell'Emilia":

  {
    "country_code": "IT",
    "country": "Italia",
    "state": "Emilia-Romagna",
    "county": "Reggio nell'Emilia",
    "postcode": "42121",
    "city": "Reggio nell'Emilia",
    "suburb": "San Pietro esterna",
    "road": "Piazza Guglielmo Marconi",
    "house_number": "11",
    "neighbourhood": "Porta San Pietro"
  }

This might be the reason for the complaints. However, we are not exactly sure about the issue yet.

List of all location names within Emilia Romagna

Then, we computed the sorted list of all reverse geocoded location names within state == "Emilia-Romagna" using this invocation:

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | \
    jq '[ map(select(.location.address.state == "Emilia-Romagna")) | .[].name ] | sort'
[
  "Autostrada Adriatica, Borgo Panigale-Reno, Bologna, Emilia-Romagna, IT",
  "Casalecchio di Reno, Unione dei comuni Valli del Reno, Lavino e Samoggia, Emilia-Romagna, IT",
  "Corso Canalchiaro, Centro Storico, Modena, Emilia-Romagna, IT",
  "Il Palazzo, Varano de' Melegari, Parma, Emilia-Romagna, IT",
  "Largo Del Pozzo, Buon Pastore-Sant'Agnese-San Damaso, Modena, Emilia-Romagna, IT",
  "Piazza Guglielmo Marconi, San Pietro esterna, Reggio nell'Emilia, Emilia-Romagna, IT",
  "Ramo Casalecchio, Borgo Panigale-Reno, Bologna, Emilia-Romagna, IT",
  "Salsomaggiore Terme, Parma, Emilia-Romagna, IT",
  "Strada Montanara, Vigatto, Parma, Emilia-Romagna, IT",
  "Strada Prinzera, Fornovo di Taro, Parma, Emilia-Romagna, IT",
  "Strada Provinciale di Val Nure, Ca' del Lupo, Vigolzone, Piacenza, Emilia-Romagna, IT",
  "Strada Scortichiere - Bianchi, Ferrè, Varsi, Parma, Emilia-Romagna, IT",
  "Strada Traversante Ravadese, San Martino, Parma, Emilia-Romagna, IT",
  "Strada del Conservatorio, Parma Centro, Parma, Emilia-Romagna, IT",
  "Strada di Fornio, Alseno, Parma, Emilia-Romagna, IT",
  "Strada statale per Reggio, San Prospero di Correggio, Correggio, Pianura Reggiana, Emilia-Romagna, IT",
  "Strada statale per Reggio, San Prospero di Correggio, Correggio, Pianura Reggiana, Emilia-Romagna, IT",
  "Tangenziale Nord Giosuè Carducci, Crocetta-San Lazzaro-Modena Est, Modena, Emilia-Romagna, IT",
  "Tangenziale Sud, Montanara, Parma, Emilia-Romagna, IT",
  "Via 27 Gennaio, Molinetto, Parma, Emilia-Romagna, IT",
  "Via Antonio Gramsci, Calderara di Reno, Unione Terre d'Acqua, Emilia-Romagna, IT",
  "Via Antonio Gramsci, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Aurelio Nicolodi, Molinetto, Parma, Emilia-Romagna, IT",
  "Via Berretta Rossa, Borgo Panigale-Reno, Bologna, Emilia-Romagna, IT",
  "Via Bologna, San Leonardo, Parma, Emilia-Romagna, IT",
  "Via Carlo Casalegno, Cittadella, Parma, Emilia-Romagna, IT",
  "Via Carlo Marx, Lubiana, Parma, Emilia-Romagna, IT",
  "Via Carlo Sigonio, Centro Storico, Modena, Emilia-Romagna, IT",
  "Via Cornini Malpeli, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Croce, Infrangibile, Piacenza, Emilia-Romagna, IT",
  "Via Don Carlo Gnocchi, Navile, Bologna, Emilia-Romagna, IT",
  "Via Don Carlo Gnocchi, Navile, Bologna, Emilia-Romagna, IT",
  "Via Enzo Ferrari, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Eustachio Manfredi, San Donato-San Vitale, Bologna, Emilia-Romagna, IT",
  "Via Ferdinando Magellano, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Fernando Croci, Cella, Reggio nell'Emilia, Emilia-Romagna, IT",
  "Via Firenze, Sorbolo Mezzani, Parma, Emilia-Romagna, IT",
  "Via Francesca Edera De Giovanni, Navile, Bologna, Emilia-Romagna, IT",
  "Via Fratelli Cervi, Carrozzone-Betonica, Reggio nell'Emilia, Emilia-Romagna, IT",
  "Via Gesso, Zola Predosa, Unione dei comuni Valli del Reno, Lavino e Samoggia, Emilia-Romagna, IT",
  "Via Ghironda, Zola Predosa, Unione dei comuni Valli del Reno, Lavino e Samoggia, Emilia-Romagna, IT",
  "Via Giambattista Vico, San Donato-San Vitale, Bologna, Emilia-Romagna, IT",
  "Via Girolamo Magnani, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Giuseppe Galluzzi, Cortemaggiore, Piacenza, Emilia-Romagna, IT",
  "Via Leopoldo Nobili, Bagnolo in Piano, Terra di Mezzo, Emilia-Romagna, IT",
  "Via Marco Polo, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Martiri Croce del Biacco, San Donato-San Vitale, Bologna, Emilia-Romagna, IT",
  "Via Monte Cimone, Crocetta, Reggio nell'Emilia, Emilia-Romagna, IT",
  "Via Paolo Ferrari, Centro Storico, Modena, Emilia-Romagna, IT",
  "Via Peschiere, San Giovanni in Persiceto, Unione Terre d'Acqua, Emilia-Romagna, IT",
  "Via Roberto Vittorangeli, Rosta Nuova, Reggio nell'Emilia, Emilia-Romagna, IT",
  "Via Roma, Quattro Castella, Colline Matildiche, Emilia-Romagna, IT",
  "Via S. De Beauvoir, Fidenza, Parma, Emilia-Romagna, IT",
  "Via San Donato, San Donato-San Vitale, Bologna, Emilia-Romagna, IT",
  "Via Sereni, Ca' dei Benatti, Campogalliano, Unione delle Terre d'Argine, Emilia-Romagna, IT",
  "Via Trento, Fidenza, Parma, Emilia-Romagna, IT",
  "Via Volturno, Molinetto, Parma, Emilia-Romagna, IT",
  "Via del Cestello, Santo Stefano, Bologna, Emilia-Romagna, IT",
  "Via del Lavoro, Casalecchio di Reno, Unione dei comuni Valli del Reno, Lavino e Samoggia, Emilia-Romagna, IT",
  "Via del Popolo, San Martino, Parma, Emilia-Romagna, IT",
  "Via dell'Isonzo, Saragozza-Porto, Bologna, Emilia-Romagna, IT",
  "Via della Villa, San Donato-San Vitale, Bologna, Emilia-Romagna, IT",
  "Via di Corticella, Navile, Bologna, Emilia-Romagna, IT",
  "Via fratelli Cairoli, Fidenza, Parma, Emilia-Romagna, IT",
  "Viale Duca Alessandro, Cittadella, Parma, Emilia-Romagna, IT",
  "Viale Mario Giurini, Santo Stefano, Bologna, Emilia-Romagna, IT"
]

We are seeing no obvious abnormalities here.

List of all cities within Emilia Romagna

A list of unique city names within Emilia Romagna can be generated using

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | \
    jq '[ map(select(.location.address.state == "Emilia-Romagna")) | .[].location.address.city ] | unique'
[
  "Alseno",
  "Bagnolo in Piano",
  "Bentivoglio",
  "Bologna",
  "Calderara di Reno",
  "Campogalliano",
  "Casalecchio di Reno",
  "Correggio",
  "Cortemaggiore",
  "Fidenza",
  "Fornovo di Taro",
  "Modena",
  "Parma",
  "Piacenza",
  "Quattro Castella",
  "Reggio nell'Emilia",
  "Sala Baganza",
  "Salsomaggiore Terme",
  "San Giovanni in Persiceto",
  "Sorbolo Mezzani",
  "Varano de' Melegari",
  "Varsi",
  "Vigolzone",
  "Zola Predosa"
]

Again, we don’t see any weird city names either.

Observations

Now I might see what you are talking about, @centralinedalbasso.

As seen on https://weather.hiveeyes.org/grafana/d/ioUrPwQiz/luftdaten-info-verlauf?var-ldi_station_countrycode=IT, there are multiple entries for Emilia Romagna.

image

Both yield a different set of cities, like Emilia-Romagna vs. EMR.

State “Emilia-Romagna”

image

State “EMR”

image

Outlook

While I haven’t investigated this more thoroughly yet, I don’t see any references to the three-letter acronyms for Italian states anymore on current invocations of Luftdatenpumpe. So, these could well be leftovers from early data coming from Nominatim which are still in our database.

Current observations

You can find out for yourself using

luftdatenpumpe stations --network=ldi --country=IT --reverse-geocode --progress | \
    jq '[ .[].location.address.state ] | unique'

which yields

[
  "Abruzzo",
  "Calabria",
  "Campania",
  "Emilia-Romagna",
  "Friuli Venezia Giulia",
  "Lazio",
  "Liguria",
  "Lombardia",
  "Piemonte",
  "Puglia",
  "Sicilia",
  "Toscana",
  "Trentino-Alto Adige/Südtirol",
  "Umbria",
  "Veneto"
]

Conclusion

So, I conclude the people from Nominatim and the OpenStreetMap community improved their database since then. However, we will still have to figure out why we have leftovers in our database and purge them appropriately.

However, when setting up a fresh system with Luftdatenpumpe, everything should be all right. Please let us know otherwise.

Ambiguities within OSM

@centralinedalbasso asked for

We would like to share that the place_id coming from the reverse geocoder is an internal id from Nominatim, so it should not be counted upon. While the osm_id referencing OSM elements might feel better, one has still to take care about its properties.

The place_id has no relevance for OSM, it is only an internal Nominatim identifier. Between different Nominatim instances, the place_id for the same OSM object may differ.

The Nominatim’s place_id is only an internal parameter of the engine. You cannot use place_id for anything, it is a technical database key and depends on a single Nominatim instance.

There are good candidates to “persistent place-identifier”, but all fails in the main property, that is to ensure persistence . In this context of non-permanent IDs, the most important example is the Nominatim’s place_id that is “independent of geometry”.

References

Dear @centralinedalbasso,

we are back from our travels and would like to ask if you have been able to resolve the ambiguities in one way or another? Otherwise, we will be happy to come back to this topic as long as our time permits in any way.

Thanks again for pointing out these issues to us. Getting respective feedback from the community is important to us as we see many improvements compared to the current state of ingesting, processing and visualizing data from open data and citizen science projects in order to make it available focusing on many local regions of Europe like the use case coming from you for Emiglia Romagna.

With kind regards,
Andreas.

We have reset the Nominatim cache, the Redis database also used for caching as well as all items with country code == “IT” from the PostgreSQL database on the server machine. After repopulating it, the three-letter codes for regions within Italy have been mitigated and everything should be fine again. See also:

https://weather.hiveeyes.org/grafana/d/AOerEQQmk/luftdaten-info-karte?var-ldi_station_countrycode=IT

Thanks a bunch for bringing this to our attention.

yes, now there are no more duplications!
thanks

1 Like