-
Notifications
You must be signed in to change notification settings - Fork 147
Description
The s3 input plugin does not store the position of the file it was busy processing when it detected that it should stop.
From the log file, you can see that the following code was called:
@logger.warn("Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started")
Because it simply stops processing and does not store the position that it already processed, when you start logstash again, it would parse the same lines again, which then leads to duplicates.
I would have expected the S3 input plugin to either
a) continue processing till the end of the file and then update the sincedb file, or to
b) stop processing immediately, but before it scans the next line, and then write the current position of the file to the sincedb file like the file input plugin does.
- Version: tested with 3.4.1, but the latest version (3.6.0) has the same shutdown handling
- My config:
input {
s3 {
aws_credentials_file => "/usr/share/logstash/.aws/credentials"
sincedb_path => "/opt/logstash/data/plugins/inputs/s3/sincedb_file"
region => "us-east-1"
bucket => "mybucket"
prefix => "myfolder/"
interval => 60
additional_settings => {
force_path_style => true
follow_redirects => false
}
codec => json
}
}
output {
stdout { codec => rubydebug }
}
- Steps to Reproduce:
Upload a very large file onto S3 and let the S3 plugin ingest the file.
While it is busy ingesting the file, let logstash shutdown (eg. sending it SIGTERM).
You should then notice the following log message:
Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started
When you then start logstash again, it would process the same records again.