Problem backfilling a large historical dataset into InfluxDB

wtf · October 28, 2019, 10:27pm

Help! When pushing data from NSIDC into InfluxDB, it croaks with

{"error":"engine: error rolling WAL segment: error opening new segment file for wal (2): open /var/lib/influxdb/wal/nsidc/autogen/55885/_00001.wal: too many open files"}

Andreas.Motl · October 28, 2019, 10:28pm

You might want to have a look at this…

Andreas.Motl · October 28, 2019, 10:33pm

If it’s really the client, as also outlined within [1] please try invoking

ulimit -n 65535

before running your import program.

[1] Too many open file on client · Issue #4569 · influxdata/influxdb · GitHub

According to /lib/systemd/system/influxdb.service, the InfluxDB service itself is already running with

LimitNOFILE=65536

Currently, I’m hesitant on increasing the overall server limits even further.

Andreas.Motl · October 28, 2019, 10:37pm

However, it looks like InfluxDB is creating a huge number of shards on the nsidc database, which might not be intended.

root@eltiempo:~# l /var/lib/influxdb/wal/nsidc/autogen/ | wc -l
1479

Do you see any way to share your import program with us? Maybe we can optimize this detail.

Andreas.Motl · October 28, 2019, 10:45pm

So, it makes sense for InfluxDB to operate like that when the time series covers a huge timespan. Is this the case with your specific dataset?

wtf · October 28, 2019, 10:48pm

at least I can say: we’ve a maximum of one record a day. within the last ~20years: daily, before than (until 1978) we have 2-4days between each … four records.

Andreas.Motl · October 28, 2019, 10:50pm

Background on “shard group duration”

So, you might consider creating the database with a specific shard group duration.

Recommendation

Andreas.Motl · October 28, 2019, 11:00pm

According to the recommendation for backfilling data cited above, this might help you along:

CREATE DATABASE <database_name> WITH SHARD DURATION 52w

wtf · October 29, 2019, 12:04pm

When saying “we highly recommend temporarily setting a longer shard group duration so fewer shards are created”, how and to what value am I reverting afterwards?

Andreas.Motl · October 29, 2019, 12:09pm

Just leave it like it is as it should reasonably match the time resolution of this dataset, right?

The current database shows it contains just 45 shards (probably matching the number of years aka. blocks of 52 weeks each)

root@eltiempo:~# l /var/lib/influxdb/data/nsidc/autogen | wc -l
45

each containing only a few kB worth of data

root@eltiempo:~# du -sch /var/lib/influxdb/data/nsidc/autogen/*
44K	/var/lib/influxdb/data/nsidc/autogen/61615
44K	/var/lib/influxdb/data/nsidc/autogen/61616
44K	/var/lib/influxdb/data/nsidc/autogen/61617
44K	/var/lib/influxdb/data/nsidc/autogen/61618

So, when querying and processing it, nobody will suffer.

P.S.: Unless further experiences regarding this…