s3 input plugin not handling shutdown correctly, leading to duplicates once started again

The s3 input plugin does not store the position of the file it was busy processing when it detected that it should stop.
From the log file, you can see that the following code was called:
```
@logger.warn("Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started")
```

Because it simply stops processing and does not store the position that it already processed, when you start logstash again, it would parse the same lines again, which then leads to duplicates.

I would have expected the S3 input plugin to either 
a) continue processing till the end of the file and then update the sincedb file, or to 
b) stop processing immediately, but before it scans the next line, and then write the current position of the file to the sincedb file like the file input plugin does.

- Version: tested with 3.4.1, but the latest version (3.6.0) has the same shutdown handling
- My config:
```
input {
    s3 {
        aws_credentials_file => "/usr/share/logstash/.aws/credentials"
        sincedb_path => "/opt/logstash/data/plugins/inputs/s3/sincedb_file"

        region => "us-east-1"
        bucket => "mybucket"
        prefix => "myfolder/"

        interval => 60
        additional_settings => {
          force_path_style => true
          follow_redirects => false
        }

        codec => json
    }
}

output {
    stdout { codec => rubydebug }
}
```
- Steps to Reproduce:
Upload a very large file onto S3 and let the S3 plugin ingest the file.
While it is busy ingesting the file, let logstash shutdown (eg. sending it SIGTERM).
You should then notice the following log message:
```
Logstash S3 input, stop reading in the middle of the file, we will read it again when logstash is started
```
When you then start logstash again, it would process the same records again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

s3 input plugin not handling shutdown correctly, leading to duplicates once started again #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

s3 input plugin not handling shutdown correctly, leading to duplicates once started again #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions