Should you store the actual (chunked) text inside your graph database? #1253

sp88011 · 2025-04-28T13:50:59Z

sp88011
Apr 28, 2025

There's probably no one-size-fits-all approach here, so I'm looking for some general guidance and things you would consider in answering this question.

Should you store the actual text of a document in the graph database?

This guide suggests that each chunk has a text property alongside and a property to store the embedding.

On the other hand, this blog post seems to suggest that no actual text is stored inside the graph database:

Each chunk is represented as a chunk node in Neo4j with properties like document position, and offset and length of text.

I'm expecting 50-100GB of data and so it would seem redundant to have the chunks of text inside the graph database alongside the full document inside classic storage, like S3.

In what cases is there a really significant benefit to having the actual text available directly in the graph database?

tessaherself · 2025-11-09T01:52:33Z

tessaherself
Nov 9, 2025

Hey,
I would be interested to know how you ended up implementing it in the end and wether you were happy?
So far I always stored the actual data as well in the db. In MongoDB I even stored Images for multimodal use-cases.

If I would have 100GB of data I would probably keep it in S3, but of course pre processed. So I would not be able to keep the pdf, xlsx etc than I need to run text extraction OCR etc etc again...
Also, if as in most use cases, speed matters, I would probably develop a nice caching system that always keeps most frequently used files in memory so that it can lookup text position etc in mili seconds.

Best,
Tess

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should you store the actual (chunked) text inside your graph database? #1253

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Should you store the actual (chunked) text inside your graph database? #1253

Uh oh!

sp88011 Apr 28, 2025

Replies: 1 comment

Uh oh!

tessaherself Nov 9, 2025

sp88011
Apr 28, 2025

tessaherself
Nov 9, 2025