Skip to content

Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id #396

@FlorisCalkoen

Description

@FlorisCalkoen

I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?

Anyways... I then noticed that the datastrip id that is included in the item.id does not match the one that is provided in item.properties.s2:datastrip_id. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.

from copy import deepcopy

import pandas as pd
import planetary_computer
import pystac_client

def items_to_dataframe(items):
    _items = []
    for i in items:
        _i = deepcopy(i)
        _items.append(_i)
    df = pd.DataFrame(pd.json_normalize(_items))
    for field in ["properties.datetime"]:
        if field in df:
            df[field] = pd.to_datetime(df[field])
    df = df.sort_values("properties.datetime")
    return df


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

roi = {
    "type": "Polygon",
    "coordinates": [
        [
            [146.0678527, -15.3746464],
            [147.0909455, -15.3765786],
            [147.0913918, -16.369226],
            [146.0632786, -16.3671625],
            [146.0678527, -15.3746464],
        ]
    ],
}


search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=roi,
    datetime="2022-01-01/2022-11-01",
)

items = search.item_collection()

items_ = [i.to_dict() for i in items]
df = items_to_dataframe(items_)


def split_id(x):
    return pd.Series(x.id.split("_"))


df[
    [
        "mission_id",
        "product_level",
        "datetake_start_time",
        "relative_orbit_number",
        "tilenumber",
        "id_datastrip",
    ]
] = df.apply(split_id, axis=1)

# two examples for which I found data which same data, but different datastrips
SAME_DATA_DIFFERENT_DATASTRIP = [
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]

df_ = df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()

# makes it a bit easier to see the difference
def split_s2_datstrip(x):
    return x["properties.s2:datastrip_id"].split("_")[6]


df_["s2_datastrip"] = df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]
id_datastrip s2_datastrip
20220227T190716 20220227T190717
20220212T221526 20220212T221527

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions