Skip to content

Conversation

@razdoburdin
Copy link
Collaborator

@razdoburdin razdoburdin commented Nov 20, 2025

Description

This PR adds few syntactic cases for xgboost regression benchmarks

Checklist:

Completeness and readability

  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

@razdoburdin razdoburdin marked this pull request as draft November 20, 2025 16:21
@david-cortes-intel
Copy link
Contributor

Wasn't this meant to replace a sparse dataset?

Also, in order to make it more realistic, how about adding more noise, irrelevant variables, skewed distributions, and so on? The first two are controllable as arguments to make_regression, while the latter would require manually applying transformations (e.g. binning).

@razdoburdin
Copy link
Collaborator Author

Wasn't this meant to replace a sparse dataset?

Also, in order to make it more realistic, how about adding more noise, irrelevant variables, skewed distributions, and so on? The first two are controllable as arguments to make_regression, while the latter would require manually applying transformations (e.g. binning).

This is the case, I have used for EMR vs GNR perf comparison. We can add more realistic cases latter.

Comment on lines +21 to +22
"n_estimators": 128,
"max_depth": 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these parameters make sense without reducing the learning rate? I guess in this case it'd be a toy problem with very high predicatibility, but would the tree structure end up being similar as what you'd get for the original epsilon data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for misunderstanding.

this is not the replacement for epsion. This is the toy case I have used for EMR vs GNR performance comparison. We need to have this case in some public benchmarks to be able to share it.

@razdoburdin razdoburdin marked this pull request as ready for review November 20, 2025 16:40
@razdoburdin razdoburdin merged commit b8e821c into IntelPython:main Nov 24, 2025
13 of 15 checks passed
@razdoburdin razdoburdin deleted the xgb_syntetic branch November 24, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants