Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id

I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see [S2 specification](https://sentinel.esa.int/documents/247904/685211/sentinel-2-products-specification-document). The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?

Anyways... I then noticed that the datastrip id that is included in the `item.id` does not match the one that is provided in `item.properties.s2:datastrip_id`. Not sure if this is important, but I thought it would be worth mentioning. Please see example below. 


```python
from copy import deepcopy

import pandas as pd
import planetary_computer
import pystac_client

def items_to_dataframe(items):
    _items = []
    for i in items:
        _i = deepcopy(i)
        _items.append(_i)
    df = pd.DataFrame(pd.json_normalize(_items))
    for field in ["properties.datetime"]:
        if field in df:
            df[field] = pd.to_datetime(df[field])
    df = df.sort_values("properties.datetime")
    return df


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

roi = {
    "type": "Polygon",
    "coordinates": [
        [
            [146.0678527, -15.3746464],
            [147.0909455, -15.3765786],
            [147.0913918, -16.369226],
            [146.0632786, -16.3671625],
            [146.0678527, -15.3746464],
        ]
    ],
}


search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=roi,
    datetime="2022-01-01/2022-11-01",
)

items = search.item_collection()

items_ = [i.to_dict() for i in items]
df = items_to_dataframe(items_)


def split_id(x):
    return pd.Series(x.id.split("_"))


df[
    [
        "mission_id",
        "product_level",
        "datetake_start_time",
        "relative_orbit_number",
        "tilenumber",
        "id_datastrip",
    ]
] = df.apply(split_id, axis=1)

# two examples for which I found data which same data, but different datastrips
SAME_DATA_DIFFERENT_DATASTRIP = [
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]

df_ = df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()

# makes it a bit easier to see the difference
def split_s2_datstrip(x):
    return x["properties.s2:datastrip_id"].split("_")[6]


df_["s2_datastrip"] = df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]

```
id_datastrip | s2_datastrip
-- | --
20220227T190716 | 20220227T190717
20220212T221526 | 20220212T221527

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id #396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

id_datastrip	s2_datastrip
20220227T190716	20220227T190717
20220212T221526	20220212T221527

Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id #396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions