Skip to content

Conversation

@david-cortes-intel
Copy link
Contributor

@david-cortes-intel david-cortes-intel commented Oct 27, 2025

Description

Logistic regression calls the L-BFGS-B solver from SciPy with mostly default parameters mimicking scikit-learn. The parameters for the solver were meant as default choices for a general solver that's aimed at solving non-convex and bound-constrained problems, but for a comparatively easier problem like L2-regularized logistic regression, they are not optimal - instead, better approximations to the true Hessian (through more corrections in the underlying quasi-Newton approximations) are typically beneficial for high-dimensional datasets.

Benchmarks on an Ice Lake machine (note: comparison is main branch against this, so relative numbers below 1 mean improvement in this PR):
image

image

Note: In many of these benchmarks, the solver is not reaching convergence, so the comparison is not always apples-to-apples. Cases where neither branch reached convergence are highlighted in yellow in the screenshots above:

Increase the number of iterations to improve the convergence (max_iter=200).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/export/users/dcortes/repos/sklex-benchenv/daal4py/sklearn/linear_model/logistic_path.py:266: ConvergenceWarning: lbfgs failed to converge after 200 iteration(s) (status=1):
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Hence, even though some cases might look like performance degradations here, they are actually reaching better solutions, so it's not an exact comparison.

Note2: the docs CI job is failing due to a rate limiting issue with medium.com.


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • All CI jobs are green or I have provided justification why they aren't.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.

@david-cortes-intel david-cortes-intel added the enhancement New feature or request label Oct 27, 2025
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 80.47% <ø> (ø)
github 82.09% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@david-cortes-intel david-cortes-intel marked this pull request as ready for review October 27, 2025 08:41
@david-cortes-intel david-cortes-intel changed the title [Experiment, do NOT merge] ENH: Increase correction pairs used for quasi Newton approximations ENH: Increase correction pairs used for quasi Newton approximations Oct 27, 2025
Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, as this is really an improvement.
But need a follow up about the benchmarking methodology: Should we fix the tolerance in order to have equally accurate solution in both cases, when comparing to stock scikit, for example?

@david-cortes-intel
Copy link
Contributor Author

Approving, as this is really an improvement. But need a follow up about the benchmarking methodology: Should we fix the tolerance in order to have equally accurate solution in both cases, when comparing to stock scikit, for example?

Yes, I'm working on a fix.

@david-cortes-intel
Copy link
Contributor Author

PR for the benchmarks: IntelPython/scikit-learn_bench#190

@david-cortes-intel david-cortes-intel merged commit 6d108d1 into uxlfoundation:main Oct 27, 2025
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants