bulk from file (json, json array, parquet, delta lake) #27

fupelaqu · 2025-12-05T16:43:18Z

Add support to bulk from multiple data sources :

Data Sources

Source Type	Format	Description
In-Memory	Scala objects	Direct streaming from collections
JSON	Text	Newline-delimited JSON (NDJSON)
JSON Array	Text	JSON array with nested structures
Parquet	Binary	Columnar storage format
Delta Lake	Directory	ACID transactional data lake

Examples:

// High-performance file indexing
implicit val options: BulkOptions = BulkOptions(
  defaultIndex = "products",
  maxBulkSize = 10000,
  balance = 16,
  disableRefresh = true
)

implicit val hadoopConf: Configuration = new Configuration()

// Load from Parquet
client.bulkFromFile(
  filePath = "/data/products.parquet",
  format = Parquet,
  idKey = Some("id")
).foreach { result =>
  result.indices.foreach(client.refresh)
  println(s"Indexed ${result.successCount} docs at ${result.metrics.throughput} docs/sec")
}

// Load from Delta Lake
client.bulkFromFile(
  filePath = "/data/delta-products",
  format = Delta,
  idKey = Some("id"),
  update = Some(true)
).foreach { result =>
  println(s"Updated ${result.successCount} products from Delta Lake")
}

// Load JSON Array with nested objects
client.bulkFromFile(
  filePath = "/data/persons.json",
  format = JsonArray,
  idKey = Some("uuid")
).foreach { result =>
  println(s"Indexed ${result.successCount} persons with nested structures")
}

…d delta files

fupelaqu added 9 commits December 5, 2025 11:16

update Bulk api to support file as a source

5380bae

update Bulk api to support file as a source

4555541

fix sl4j conflict

0394783

add support for json array and delta lake

1de3c70

fix json array source file + init file source specifications

2d707a4

fix file validation for delta lake, add specifications for parquet an…

7e164bb

…d delta files

add bulk api specifications for all clients

cc7805e

add file source metadata methods + add headers

b6a76ff

update documentation for bulk api

33b1de3

fupelaqu marked this pull request as ready for review December 5, 2025 20:39

fupelaqu added 2 commits December 6, 2025 06:51

update documentation

9a2f172

to fix scala 2.12 compatibility

92e8c1e

fupelaqu merged commit e4b0e40 into main Dec 6, 2025
2 checks passed

fupelaqu deleted the feature/bulkFromSourceFile branch December 9, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bulk from file (json, json array, parquet, delta lake) #27

bulk from file (json, json array, parquet, delta lake) #27

Uh oh!

fupelaqu commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bulk from file (json, json array, parquet, delta lake) #27

bulk from file (json, json array, parquet, delta lake) #27

Uh oh!

Conversation

fupelaqu commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Data Sources

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fupelaqu commented Dec 5, 2025 •

edited

Loading