Skip to content

Commit f85c485

Browse files
committed
init new version
1 parent c6b94b6 commit f85c485

File tree

6 files changed

+498
-279
lines changed

6 files changed

+498
-279
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
main
2+
*.log

README.md

Lines changed: 0 additions & 277 deletions
Original file line numberDiff line numberDiff line change
@@ -7,283 +7,6 @@
77
⚠️ This Project is still in development and not ready for production use. ⚠️
88

99
# OuroborosDB
10-
A embedded database built around the concept of event trees, emphasizing data deduplication and data integrity checks. By structuring data into event trees, OuroborosDB ensures efficient and intuitive data management. Key features include:
11-
12-
- Data Deduplication: Eliminates redundant data through efficient chunking and hashing mechanisms.
13-
- Data Integrity Checks: Uses SHA-512 hashes to verify the integrity of stored data.
14-
- Event-Based Architecture: Organizes data hierarchically for easy retrieval and management.
15-
- Scalable Concurrent Processing: Optimized for concurrent processing to handle large-scale data.
16-
- Log Management and Indexing: Provides efficient logging and indexing for performance monitoring.
17-
- Non-Deletable Events: Once stored, events cannot be deleted or altered, ensuring the immutability and auditability of the data.
18-
- (To be implemented) Temporary Events: Allows the creation of temporary events that can be marked as temporary and safely cleaned up later for short-term data storage needs.
19-
20-
21-
## Table of Contents
22-
23-
- [OuroborosDB](#ouroborosdb)
24-
- [Table of Contents](#table-of-contents)
25-
- [Installation](#installation)
26-
- [Usage](#usage)
27-
- [Initialization](#initialization)
28-
- [Storing Files](#storing-files)
29-
- [Retrieving Files](#retrieving-files)
30-
- [Event Tree Management](#event-tree-management)
31-
- [Creating Root Event](#creating-root-event)
32-
- [Fetching Root Events by Title](#fetching-root-events-by-title)
33-
- [Creating Child Events](#creating-child-events)
34-
- [Fetching Child Events](#fetching-child-events)
35-
- [Testing](#testing)
36-
- [Benchmarking](#benchmarking)
37-
- [Benchmark current state of the codebase](#benchmark-current-state-of-the-codebase)
38-
- [Benchmark Versions](#benchmark-versions)
39-
- [OuroborosDB Performance Version Differences](#ouroborosdb-performance-version-differences)
40-
- [OuroborosDB Performance Changelog](#ouroborosdb-performance-changelog)
41-
- [1.0.0 Features](#100-features)
42-
- [Future Features](#future-features)
43-
- [Current Problems and things to research:](#current-problems-and-things-to-research)
44-
- [DB performance aims](#db-performance-aims)
45-
- [Name and Logo](#name-and-logo)
46-
- [License](#license)
47-
48-
## Installation
49-
50-
OuroborosDB requires Go 1.21.5+
51-
52-
```bash
53-
go get -u github.com/i5heu/OuroborosDB
54-
```
55-
56-
## Usage
57-
58-
### Initialization
59-
60-
**OuroborosDB** can be initialized with a configuration struct that includes paths for storage and other settings.
61-
62-
```go
63-
import "OuroborosDB"
64-
65-
func initializeDB() *OuroborosDB.OuroborosDB {
66-
db, err := OuroborosDB.NewOuroborosDB(OuroborosDB.Config{
67-
Paths: []string{"./data/storage"},
68-
MinimumFreeGB: 1,
69-
GarbageCollectionInterval: 10, // Minutes
70-
})
71-
if err != nil {
72-
log.Fatalf("Failed to initialize OuroborosDB: %v", err)
73-
}
74-
return db
75-
}
76-
```
77-
78-
### Storing Files
79-
80-
Files can be stored within events using the `StoreFile` method.
81-
82-
```go
83-
import (
84-
"OuroborosDB/internal/storage"
85-
"OuroborosDB"
86-
)
87-
88-
func storeFile(db *OuroborosDB.OuroborosDB, parentEvent storage.Event) storage.Event {
89-
fileContent := []byte("This is a sample file content")
90-
metadata := []byte("sample.txt")
91-
92-
event, err := db.DB.StoreFile(storage.StoreFileOptions{
93-
EventToAppendTo: parentEvent,
94-
Metadata: metadata,
95-
File: fileContent,
96-
})
97-
if err != nil {
98-
log.Fatalf("Failed to store file: %v", err)
99-
}
100-
return event
101-
}
102-
```
103-
104-
### Retrieving Files
105-
106-
Files can be retrieved by providing the event from which they were stored.
107-
108-
```go
109-
func retrieveFile(db *OuroborosDB.OuroborosDB, event storage.Event) []byte {
110-
content, err := db.DB.GetFile(event)
111-
if err != nil {
112-
log.Fatalf("Failed to retrieve file: %v", err)
113-
}
114-
return content
115-
}
116-
```
117-
118-
### Event Tree Management
119-
120-
#### Creating Root Event
121-
122-
Create a root event to represent the top level of an event tree.
123-
124-
```go
125-
func createRootEvent(db *OuroborosDB.OuroborosDB, title string) storage.Event {
126-
rootEvent, err := db.DB.CreateRootEvent(title)
127-
if err != nil {
128-
log.Fatalf("Failed to create root event: %v", err)
129-
}
130-
return rootEvent
131-
}
132-
```
133-
134-
#### Fetching Root Events by Title
135-
136-
```go
137-
func getRootEventsByTitle(db *OuroborosDB.OuroborosDB, title string) []storage.Event {
138-
events, err := db.DB.GetRootEventsWithTitle(title)
139-
if err != nil {
140-
log.Fatalf("Failed to fetch root events by title: %v", err)
141-
}
142-
return events
143-
}
144-
```
145-
146-
#### Creating Child Events
147-
148-
```go
149-
func createChildEvent(db *OuroborosDB.OuroborosDB, parentEvent storage.Event) storage.Event {
150-
childEvent, err := db.DB.CreateNewEvent(storage.EventOptions{
151-
HashOfParentEvent: parentEvent.EventHash,
152-
})
153-
if err != nil {
154-
log.Fatalf("Failed to create child event: %v", err)
155-
}
156-
return childEvent
157-
}
158-
```
159-
160-
#### Fetching Child Events
161-
162-
```go
163-
func getChildEvents(db *OuroborosDB.OuroborosDB, parentEvent storage.Event) []storage.Event {
164-
children, err := db.Index.GetDirectChildrenOfEvent(parentEvent.EventHash)
165-
if err != nil {
166-
log.Fatalf("Failed to fetch child events: %v", err)
167-
}
168-
return children
169-
}
170-
```
171-
172-
## Testing
173-
174-
```bash
175-
go test ./...
176-
```
177-
178-
## Benchmarking
179-
### Benchmark current state of the codebase
180-
```bash
181-
go test -run='^$' -bench=.
182-
```
183-
### Benchmark Versions
184-
Works with committed changes and version/commits that are reachable by `git checkout`.
185-
You also need to have installed `benchstat` to compare the benchmarks, install it with `go get golang.org/x/perf/cmd/benchstat@latest`
186-
187-
```bash
188-
# add versions to bench.sh
189-
bash bench.sh
190-
# Now look in benchmarks/combined_benchmarks_comparison to see the results
191-
```
192-
## OuroborosDB Performance Version Differences
193-
```bash
194-
goos: linux
195-
goarch: amd64
196-
pkg: github.com/i5heu/ouroboros-db
197-
cpu: AMD Ryzen 9 5900X 12-Core Processor
198-
│ benchmarks/v0.0.5.txt │ benchmarks/v0.0.8.txt │ benchmarks/main.txt │
199-
│ sec/op │ sec/op vs base │ sec/op vs base │
200-
_setupDBWithData/RebuildIndex-24 414.3m ± 12% 423.4m ± 7% ~ (p=0.310 n=6)
201-
_Index_RebuildingIndex/RebuildIndex-24 14.85m ± 22% 13.29m ± 30% ~ (p=0.699 n=6) 17.83m ± 12% +20.06% (p=0.015 n=6)
202-
_Index_GetDirectChildrenOfEvent/GetChildrenOfEvent-24 2.408µ ± 11% 2.472µ ± 9% ~ (p=0.937 n=6) 2.288µ ± 11% ~ (p=0.180 n=6)
203-
_Index_GetChildrenHashesOfEvent/GetChildrenHashesOfEvent-24 38.81n ± 6% 40.58n ± 9% ~ (p=0.071 n=6) 38.48n ± 6% ~ (p=0.394 n=6)
204-
_DB_StoreFile/StoreFile-24 109.0µ ± 8% 103.5µ ± 14% ~ (p=0.310 n=6) 106.0µ ± 13% ~ (p=0.132 n=6)
205-
_DB_GetFile/GetFile-24 2.338µ ± 4% 2.374µ ± 6% ~ (p=0.394 n=6) 2.323µ ± 5% ~ (p=0.619 n=6)
206-
_DB_GetEvent/GetEvent-24 3.186µ ± 12% 3.274µ ± 6% ~ (p=0.485 n=6) 3.228µ ± 8% ~ (p=0.699 n=6)
207-
_DB_GetMetadata/GetMetadata-24 2.547µ ± 14% 2.532µ ± 13% ~ (p=0.818 n=6) 2.531µ ± 7% ~ (p=0.699 n=6)
208-
_DB_GetAllRootEvents/GetAllRootEvents-24 11.17m ± 13% 10.88m ± 17% ~ (p=0.699 n=6) 11.26m ± 11% ~ (p=0.937 n=6)
209-
_DB_GetRootIndex/GetRootIndex-24 1.479m ± 10% 1.488m ± 3% ~ (p=0.589 n=6) 1.428m ± 13% ~ (p=0.699 n=6)
210-
_DB_GetRootEventsWithTitle/GetRootEventsWithTitle-24 6.123µ ± 8% 6.469µ ± 14% ~ (p=0.310 n=6) 6.533µ ± 9% ~ (p=0.093 n=6)
211-
_DB_CreateRootEvent/CreateRootEvent-24 83.08µ ± 14% 80.44µ ± 13% ~ (p=0.937 n=6) 89.10µ ± 18% ~ (p=0.394 n=6)
212-
_DB_CreateNewEvent/CreateNewEvent-24 26.78µ ± 22% 28.04µ ± 13% ~ (p=1.000 n=6) 29.43µ ± 10% ~ (p=0.394 n=6)
213-
_setupDBWithData/setupDBWithData-24 442.0m ± 7%
214-
_DB_fastMeta/CreateNewEvent_with_FastMeta-24 28.64µ ± 13%
215-
_DB_fastMeta/GetEvent_with_FastMeta-24 3.281µ ± 17%
216-
_Index_GetParentHashOfEvent/GetParentHashOfEvent-24 44.01n ± 12%
217-
_Index_RebuildIFastMeta/RebuildFastMeta-24 596.8µ ± 8%
218-
_Index_GetEventHashesByFastMetaParameter/GetEventHashesByFastMetaParameter-24 40.20n ± 13%
219-
_Index_GetEventsByFastMeta/GetEventsByFastMeta-24 2.014m ± 10%
220-
geomean 63.40µ 63.47µ +0.11% 33.13µ +2.51% ¹
221-
¹ benchmark set differs from baseline; geomeans may not be comparable
222-
223-
```
224-
225-
## OuroborosDB Performance Changelog
226-
227-
- **v0.0.14** - Major refactor of the Event type and introduction of FastMeta which should speed up search
228-
- **v0.0.3** - Switch from `gob` to `protobuf` for serialization
229-
- **v0.0.2** - Create tests and benchmarks
230-
231-
232-
## 1.0.0 Features
233-
🚧 = currently in development
234-
235-
- [x] Data Deduplication
236-
- [x] Basic Data Store and Retrieval
237-
- [x] Child to Parent Index
238-
- [ ] Data Basics
239-
- [ ] 🚧 Data Compression with LZMA
240-
- [ ] Erasure coding
241-
- [ ] Encryption
242-
- [ ] Data Integrity Checks
243-
- [ ] Distributed System
244-
- [ ] 🚧 Bootstrap System
245-
- [ ] Authentication
246-
- [ ] Message Distribution
247-
- [ ] Broadcast
248-
- [ ] Unicast
249-
- [ ] Data Replication
250-
- [ ] DHT for Sharding? - maybe full HT is enough
251-
- [ ] Data Collection
252-
- [ ] Find and Collect Data in Network
253-
- [ ] Allow other nodes that are faster to collect data and send it to the slower with zstd
254-
255-
## Future Features
256-
- [ ] Full Text Search - blevesearch/bleve
257-
- [ ] Semantic Search with API requests for Embeddings
258-
- [ ] Is the deletion of not Temporary Events a good idea?
259-
- Maybe if only some superUser can delete them with a key or something.
260-
- [ ] It would be nice to have pipelines that can run custom JS or webassembly to do arbitrary things.
261-
- With http routing we could build a webserver that can run inside a "pipeline" in the database. sksksk
262-
- They should be usable as scraper or time or event based notificators.
263-
- Like if this event gets a child recursively, upload this tree to a server.
264-
- this would need a virtual folder structure that is represented in an event.
265-
- with this we could also build a webdav server that can be used to access parts of the database.
266-
267-
## Current Problems and things to research:
268-
- [ ] Garbage Collection would delete Chunks that in the process of being used in a new event
269-
- [ ] Deletion of Temporary Events is not yet discovered
270-
- [ ] We have EventChilds that are used as either
271-
- A Item in the "category" of the Event
272-
- New Information that replaces it's Parent
273-
- Changes to the Parent (think patches)
274-
We need to reflect this in the Event Structure to lower chunk lookups
275-
If we implement a potential DeltaEvent, we need to provide tooling for it.
276-
- is it like git where we have a diff of the file?
277-
- is it a new file that replaces the old one?
278-
- we already have the chunk system in place. But this seams to not be suitable for text files - so we would need a text based delta event?
279-
280-
281-
## DB performance aims
282-
| ID | Environment | Requirements |
283-
| ----------------------------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- |
284-
| #TARGET_Store-100TB | simulated for each component (real tests cost a lot of money) | 🔜 Store 100TB of row data (400M Chunks)<br> 🔜 having retrieval times of random single chunks and events of under 10ms |
285-
| #TARGET_Retrieval-1GB-16s | 10 nodes globally distributed with test chunks spread globally | 🔜 Retrieve 1GB of data in less than 16 seconds (This is a full 0.5GB/s retrieval speed of 40'000 250KB chunks) |
286-
| #TARGET_SplitBrain-Resilience | 3 partitions, 3 nodes each, 100GB new raw data per partition and 1M new events | 🔜 Recover from a network partition in under 300 seconds |
28710

28811
## Name and Logo
28912

go.mod

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,35 @@ go 1.24.5
44

55
require github.com/sirupsen/logrus v1.9.3
66

7-
require golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8 // indirect
7+
require (
8+
github.com/cespare/xxhash/v2 v2.3.0 // indirect
9+
github.com/cloudflare/circl v1.6.0 // indirect
10+
github.com/dgraph-io/badger/v4 v4.8.0 // indirect
11+
github.com/dgraph-io/ristretto/v2 v2.2.0 // indirect
12+
github.com/dustin/go-humanize v1.0.1 // indirect
13+
github.com/go-logr/logr v1.4.3 // indirect
14+
github.com/go-logr/stdr v1.2.2 // indirect
15+
github.com/go-ole/go-ole v1.2.6 // indirect
16+
github.com/google/flatbuffers v25.2.10+incompatible // indirect
17+
github.com/i5heu/ouroboros-crypt v1.1.0 // indirect
18+
github.com/i5heu/ouroboros-kv v1.1.1 // indirect
19+
github.com/ipfs/boxo v0.33.0 // indirect
20+
github.com/ipfs/go-log/v2 v2.6.0 // indirect
21+
github.com/klauspost/compress v1.18.0 // indirect
22+
github.com/klauspost/cpuid/v2 v2.2.10 // indirect
23+
github.com/klauspost/reedsolomon v1.12.5 // indirect
24+
github.com/libp2p/go-buffer-pool v0.1.0 // indirect
25+
github.com/mattn/go-isatty v0.0.20 // indirect
26+
github.com/shirou/gopsutil v2.21.11+incompatible // indirect
27+
github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f // indirect
28+
github.com/yusufpapurcu/wmi v1.2.4 // indirect
29+
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
30+
go.opentelemetry.io/otel v1.37.0 // indirect
31+
go.opentelemetry.io/otel/metric v1.37.0 // indirect
32+
go.opentelemetry.io/otel/trace v1.37.0 // indirect
33+
go.uber.org/multierr v1.11.0 // indirect
34+
go.uber.org/zap v1.27.0 // indirect
35+
golang.org/x/net v0.41.0 // indirect
36+
golang.org/x/sys v0.34.0 // indirect
37+
google.golang.org/protobuf v1.36.6 // indirect
38+
)

0 commit comments

Comments
 (0)