Skip to content

s3 input plugin not handling shutdown correctly, leading to duplicates once started again #226

@PadaKwaak

Description

@PadaKwaak

The s3 input plugin does not store the position of the file it was busy processing when it detected that it should stop.
From the log file, you can see that the following code was called:

@logger.warn("Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started")

Because it simply stops processing and does not store the position that it already processed, when you start logstash again, it would parse the same lines again, which then leads to duplicates.

I would have expected the S3 input plugin to either
a) continue processing till the end of the file and then update the sincedb file, or to
b) stop processing immediately, but before it scans the next line, and then write the current position of the file to the sincedb file like the file input plugin does.

  • Version: tested with 3.4.1, but the latest version (3.6.0) has the same shutdown handling
  • My config:
input {
    s3 {
        aws_credentials_file => "/usr/share/logstash/.aws/credentials"
        sincedb_path => "/opt/logstash/data/plugins/inputs/s3/sincedb_file"

        region => "us-east-1"
        bucket => "mybucket"
        prefix => "myfolder/"

        interval => 60
        additional_settings => {
          force_path_style => true
          follow_redirects => false
        }

        codec => json
    }
}

output {
    stdout { codec => rubydebug }
}
  • Steps to Reproduce:
    Upload a very large file onto S3 and let the S3 plugin ingest the file.
    While it is busy ingesting the file, let logstash shutdown (eg. sending it SIGTERM).
    You should then notice the following log message:
Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started

When you then start logstash again, it would process the same records again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions