Replies: 1 comment
-
|
Hey, If I would have 100GB of data I would probably keep it in S3, but of course pre processed. So I would not be able to keep the pdf, xlsx etc than I need to run text extraction OCR etc etc again... Best, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There's probably no one-size-fits-all approach here, so I'm looking for some general guidance and things you would consider in answering this question.
Should you store the actual text of a document in the graph database?
This guide suggests that each chunk has a
textproperty alongside and a property to store the embedding.On the other hand, this blog post seems to suggest that no actual text is stored inside the graph database:
I'm expecting 50-100GB of data and so it would seem redundant to have the chunks of text inside the graph database alongside the full document inside classic storage, like S3.
In what cases is there a really significant benefit to having the actual text available directly in the graph database?
Beta Was this translation helpful? Give feedback.
All reactions