-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?
Anyways... I then noticed that the datastrip id that is included in the item.id does not match the one that is provided in item.properties.s2:datastrip_id. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.
from copy import deepcopy
import pandas as pd
import planetary_computer
import pystac_client
def items_to_dataframe(items):
_items = []
for i in items:
_i = deepcopy(i)
_items.append(_i)
df = pd.DataFrame(pd.json_normalize(_items))
for field in ["properties.datetime"]:
if field in df:
df[field] = pd.to_datetime(df[field])
df = df.sort_values("properties.datetime")
return df
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
roi = {
"type": "Polygon",
"coordinates": [
[
[146.0678527, -15.3746464],
[147.0909455, -15.3765786],
[147.0913918, -16.369226],
[146.0632786, -16.3671625],
[146.0678527, -15.3746464],
]
],
}
search = catalog.search(
collections=["sentinel-2-l2a"],
intersects=roi,
datetime="2022-01-01/2022-11-01",
)
items = search.item_collection()
items_ = [i.to_dict() for i in items]
df = items_to_dataframe(items_)
def split_id(x):
return pd.Series(x.id.split("_"))
df[
[
"mission_id",
"product_level",
"datetake_start_time",
"relative_orbit_number",
"tilenumber",
"id_datastrip",
]
] = df.apply(split_id, axis=1)
# two examples for which I found data which same data, but different datastrips
SAME_DATA_DIFFERENT_DATASTRIP = [
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]
df_ = df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()
# makes it a bit easier to see the difference
def split_s2_datstrip(x):
return x["properties.s2:datastrip_id"].split("_")[6]
df_["s2_datastrip"] = df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]| id_datastrip | s2_datastrip |
|---|---|
| 20220227T190716 | 20220227T190717 |
| 20220212T221526 | 20220212T221527 |
Metadata
Metadata
Assignees
Labels
No labels