Skip to content

Conversation

@justine202429
Copy link

@justine202429 justine202429 commented Oct 4, 2025

@justine202429 justine202429 marked this pull request as ready for review October 13, 2025 09:26
Copy link
Member

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs tests and an entry in whatsnew.

@Alvaro-Kothe Alvaro-Kothe added the IO Excel read_excel, to_excel label Oct 14, 2025
Comment on lines 1597 to 1600
with ExcelWriter("test.xlsx", engine="openpyxl") as writer:
df.to_excel(writer, sheet_name="Sheet1", merge_cells=merge_cells)

reader = ExcelFile("test.xlsx")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't read and write to CWD, either use a temporary file, or use an in-memory buffer.

DummyClass.assert_called_and_reset()

@td.skip_if_no("openpyxl")
def test_to_excel_multiindex_nan_in_columns(self, merge_cells):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test all excel engines.

Comment on lines 1601 to 1605
result = pd.read_excel(reader, index_col=0, header=[0, 1])

original_values = df.to_numpy()
result_values = result.to_numpy()
tm.assert_numpy_array_equal(original_values, result_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an expected DataFrame and use tm.assert_frame_equal

@mathbruu
Copy link

pre-commit.ci autofix

@mathbruu
Copy link

done

with ExcelFile(tmp_excel) as reader:
result = pd.read_excel(reader, index_col=0, header=[0, 1])

tm.assert_numpy_array_equal(result.to_numpy(), df.to_numpy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test the header.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test validates that data survives the Excel round-trip. NaN in headers are written correctly (verified with openpyxl) but cannot be read back due to Excel treating empty cells as blanks. This is an Excel limitation, not a code bug.

buf = BytesIO()
df.to_excel(buf)

def test_to_excel_multiindex_nan_in_columns(self, merge_cells, tmp_excel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes on main

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand is it not supposed to passed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are expected to fail on main without the patch. If they’re passing, it means the bug isn’t actually being reproduced, so you are not truly verifying the fix.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit confused: this test case doesn’t exist on main at all.
I only created it in this branch, so I don’t understand how it could be “passing on main.”
Is there something I’m missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove your patch with

git restore --source=upstream/main -- pandas/io/formats/excel.py

and run the tests with

pytest pandas/tests/io/excel/test_writers.py

The test that you created still passes. Hence, it's not testing your fix.

GH#62340: Use original column values (with NaN) instead of NBSP-filled
values when writing MultiIndex headers to Excel.

- Modify _format_header_mi() to use columns.get_level_values() to get
  the original column values with NaN preserved
- Add test to verify MultiIndex structure and data integrity are
  preserved during Excel round-trip
- Note: read_excel() limitation means NaN in headers become empty cells
  in Excel and cannot be reconstructed on read, but data values are
  correctly preserved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO Excel read_excel, to_excel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: NaN categorical in multi-level column gets replaced in to_excel output

3 participants