Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 8, 2025

@vstinner vstinner marked this pull request as draft October 11, 2025 21:57
@vstinner
Copy link
Member Author

I convert this PR to a draft for now since it seems like the API is misused by 3rd party projects, and I proposed PyDict_FromItems() which is a different abstraction: #139963

@vstinner vstinner force-pushed the dict_presized branch 3 times, most recently from eb555c6 to 8bb9715 Compare October 12, 2025 12:40
@vstinner
Copy link
Member Author

I rewrote the PR to add unicode_keys parameters: PyObject* PyDict_NewPresized(Py_ssize_t size, int unicode_keys).

@methane
Copy link
Member

methane commented Oct 13, 2025

There are two news entries.

@vstinner
Copy link
Member Author

vstinner commented Oct 13, 2025

Benchmark on PyDict_New() vs PyDict_NewPresized() with Unicode keys:

Benchmark setitem presized
dict-5 2.02 us 1.95 us: 1.04x faster
dict-10 3.72 us 3.52 us: 1.06x faster
dict-25 8.99 us 8.14 us: 1.10x faster
dict-100 31.7 us 29.3 us: 1.08x faster
dict-500 147 us 136 us: 1.08x faster
dict-1,000 293 us 272 us: 1.07x faster
Geometric mean (ref) 1.06x faster

Benchmark hidden because not significant (1): dict-1

UPDATE: Benchmark rerun to fix a refleak in benchmarks (DECREF key and value after SetItem).

Code:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index c14f925b4e7..d2044b55f76 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2595,6 +2595,80 @@ create_managed_weakref_nogc_type(PyObject *self, PyObject *Py_UNUSED(args))
 }
 
 
+static PyObject *
+bench_dict_new(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_New();
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+            Py_DECREF(key);
+            Py_DECREF(value);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
+static PyObject *
+bench_dict_presized(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_NewPresized(size, 1);
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+            Py_DECREF(key);
+            Py_DECREF(value);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -2691,6 +2765,8 @@ static PyMethodDef TestMethods[] = {
     {"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
     {"create_managed_weakref_nogc_type",
         create_managed_weakref_nogc_type, METH_NOARGS},
+    {"bench_dict_new", bench_dict_new, METH_VARARGS},
+    {"bench_dict_presized", bench_dict_presized, METH_VARARGS},
     {NULL, NULL} /* sentinel */
 };

bench_new.py:

import pyperf
import _testcapi

runner = pyperf.Runner()
for size in (1, 5, 10, 25, 100, 500, 1_000):
    runner.bench_time_func(f'dict-{size:,}', _testcapi.bench_dict_new, size)

bench_presized.py:

import pyperf
import _testcapi

runner = pyperf.Runner()
for size in (1, 5, 10, 25, 100, 500, 1_000):
    runner.bench_time_func(f'dict-{size:,}', _testcapi.bench_dict_presized, size)

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#80 to the C API Working Group for this API.

@vstinner
Copy link
Member Author

vstinner commented Oct 13, 2025

Benchmark on PyDict_New() vs PyDict_NewPresized() with integer keys:

Benchmark setitem presized
dict-5 1.59 us 1.56 us: 1.02x faster
dict-10 3.18 us 3.15 us: 1.01x faster
dict-25 7.76 us 6.80 us: 1.14x faster
dict-100 26.8 us 26.0 us: 1.03x faster
dict-500 123 us 119 us: 1.03x faster
dict-1,000 249 us 241 us: 1.03x faster
Geometric mean (ref) 1.04x faster

Benchmark hidden because not significant (1): dict-1

UPDATE: Benchmark re-run to fix refleak in the benchmark.

@davidhewitt
Copy link
Contributor

This seems useful to me for PyO3 👍

I am unsure how reliably we will be able to use the unicode_keys hint. My feeling is that it might be the case that in cases where we're confident about the key types we would have been able to use the proposed PyDict_FromItems.

@vstinner
Copy link
Member Author

I am unsure how reliably we will be able to use the unicode_keys hint. My feeling is that it might be the case that in cases where we're confident about the key types we would have been able to use the proposed PyDict_FromItems.

Correct.

If you know your input data, you can set the unicode_keys hint in advance, before consuming the iterator. You can use PyDict_NewPresized() in this case.

If you don't know your input data, you might need to consume the iterator and store keys and values in a temporary array, and then call PyDict_FromItems() which computes the unicode_keys hint for you.

@davidhewitt
Copy link
Contributor

I think this seems the wrong way around for me as a user; if I don't know my input data I'd rather not collect it to a temporary array, it could be a large dataset which would be a big temporary allocation.

If I know the input data, I was thinking I would even be able to allocate the items in stack memory before calling PyDict_FromItems.

@davidhewitt
Copy link
Contributor

Or are you saying that it is more efficient to use PyDict_NewPresized and repeated calls to PyDict_SetItem than to use PyDict_FromItems?

@vstinner
Copy link
Member Author

Or are you saying that it is more efficient to use PyDict_NewPresized and repeated calls to PyDict_SetItem than to use PyDict_FromItems?

Oh, I don't know which function is faster. So I ran benchmarks: #139963 (comment). PyDict_FromItems() is faster than PyDict_NewPresized()+PyDict_SetItem().

@davidhewitt
Copy link
Contributor

👍 that matches what I was thinking then:

  • When I know the data I will use PyDict_FromItems
  • When I don't know the data (e.g. Rust iterator) I could maybe use PyDict_NewPresized using the iterator size hint, but maybe the performance difference is not good enough to justify and I should just continue to use PyDict_New.

@methane
Copy link
Member

methane commented Oct 18, 2025

@vstinner You forgot Py_DECREF(key); Py_DECREF(value); after PyDict_SetItem().

Excluding PyUnicode_FromFormat(), the performance difference between PyDict_NewPresized + PyDict_SetItem and PyDict_FromItems was negligible.

Benchmark presized fromitems
dict-1 44.3 ns 44.0 ns: 1.01x faster
dict-100 2.97 us 3.02 us: 1.02x slower
dict-1,000 30.3 us 30.6 us: 1.01x slower
Geometric mean (ref) 1.00x slower

Benchmark hidden because not significant (2): dict-10, dict-10,000

diff --git a/Modules/_testcapi/dict.c b/Modules/_testcapi/dict.c
index 795b2e67b67..90d7c374722 100644
--- a/Modules/_testcapi/dict.c
+++ b/Modules/_testcapi/dict.c
@@ -269,6 +269,113 @@ dict_newpresized(PyObject *self, PyObject *args)
     return PyDict_NewPresized(size, unicode_keys);
 }

+static PyObject *
+bench_dict_newpresized(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyObject **keys = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+    if (keys == NULL) {
+        return NULL;
+    }
+    PyObject **values = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+    if (values == NULL) {
+        return NULL;
+    }
+    for (Py_ssize_t i=0; i < size; i++) {
+        PyObject *key = PyUnicode_FromFormat("%zi", i);
+        assert(key != NULL);
+
+        PyObject *value = PyLong_FromLong(i);
+        assert(value != NULL);
+
+        keys[i] = key;
+        values[i] = value;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_NewPresized(size, 1);
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            assert(PyDict_SetItem(d, keys[i], values[i]) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    for (Py_ssize_t i=0; i < size; i++) {
+        Py_DECREF(keys[i]);
+        Py_DECREF(values[i]);
+    }
+    PyMem_Free(keys);
+    PyMem_Free(values);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+extern PyObject *
+_PyDict_FromItems(PyObject *const *keys, Py_ssize_t keys_offset,
+                  PyObject *const *values, Py_ssize_t values_offset,
+                  Py_ssize_t length);
+
+static PyObject *
+bench_dict_fromitems(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyObject **keys = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+    if (keys == NULL) {
+        return NULL;
+    }
+    PyObject **values = (PyObject **)PyMem_Malloc(size * sizeof(PyObject*));
+    if (values == NULL) {
+        return NULL;
+    }
+    for (Py_ssize_t i=0; i < size; i++) {
+        PyObject *key = PyUnicode_FromFormat("%zi", i);
+        assert(key != NULL);
+
+        PyObject *value = PyLong_FromLong(i);
+        assert(value != NULL);
+
+        keys[i] = key;
+        values[i] = value;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = _PyDict_FromItems(keys, 1, values, 1, size);
+        assert(d != NULL);
+        Py_DECREF(d);
+
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    for (Py_ssize_t i=0; i < size; i++) {
+        Py_DECREF(keys[i]);
+        Py_DECREF(values[i]);
+    }
+    PyMem_Free(keys);
+    PyMem_Free(values);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+

 static PyMethodDef test_methods[] = {
     {"dict_containsstring", dict_containsstring, METH_VARARGS},
@@ -282,6 +389,8 @@ static PyMethodDef test_methods[] = {
     {"dict_popstring_null", dict_popstring_null, METH_VARARGS},
     {"test_dict_iteration", test_dict_iteration, METH_NOARGS},
     {"dict_newpresized", dict_newpresized, METH_VARARGS},
+    {"bench_dict_newpresized", bench_dict_newpresized, METH_VARARGS},
+    {"bench_dict_fromitems", bench_dict_fromitems, METH_VARARGS},

@methane
Copy link
Member

methane commented Oct 18, 2025

_PyDict_FromItems() is almost:

  • check if the all keys are unicode
  • PyDict_NewPresized()
  • PyDict_SetItem()

So performance difference should be very small.

In case of free-threaded build, PyDict_SetItem() locks the dict each time vs PyDict_FromItems() locks the dict once. So PyDict_FromItems() can be faster.

@vstinner
Copy link
Member Author

vstinner commented Dec 5, 2025

@vstinner You forgot Py_DECREF(key); Py_DECREF(value); after PyDict_SetItem().

Ooops :-( I fixed the benchmarks and re-run them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants