Add auth support, split-by-type option, and pagination for large graphs #26

shahar-biron · 2025-10-28T16:26:48Z

Changes

Add --username and --password options for FalkorDB authentication
Add --split-by-type/--no-split-by-type flags to control CSV file organization
Implement pagination to export all nodes and edges (fixes 10k limit issue)
Add progress feedback during export

Problem Solved

The exporter was previously limited to 10,000 nodes/edges due to FalkorDB's default result limit. This PR adds pagination to fetch all data in batches.

New Features

Authentication: Optional username/password for secure FalkorDB connections
Flexible output: Choose between separate files per type or combined files
Progress tracking: Real-time feedback on export progress

Summary by CodeRabbit

New Features
- Optional authentication for exports and CLI flags for username/password and split-by-type
- Configurable output: per-label/type CSVs (default) or consolidated nodes.csv/edges.csv
- Batched, paginated exports for large graphs with per-label/type progress logging and final summary counts
Documentation
- README and CLI docs updated with auth, pagination, split/combined examples
Tests
- Added tests for auth, combined-mode, default split-mode, and pagination on large graphs

- Add --username and --password options for FalkorDB authentication - Add --split-by-type/--no-split-by-type flags to control CSV file organization - Implement pagination to export all nodes and edges (fixes 10k limit) - Add progress feedback during export

coderabbitai · 2025-10-28T16:27:10Z

Walkthrough

Export tool now accepts optional username/password, a split_by_type toggle, and connects via a constructed URL. Nodes and edges are retrieved in batched, paginated queries, accumulated by label/type, and written either to per-label/type CSVs or combined nodes.csv/edges.csv. CLI, README, and tests updated.

Changes

Cohort / File(s)	Summary
Core export functionality `main.py`	`export_graph()` signature changed to `(..., username=None, password=None, split_by_type=True)`; FalkorDB connection created from a constructed URL; nodes and edges fetched in batched, paginated queries (last_id, batch_size=10000) with progress logging; accumulate by label/type; conditional per-label/per-type CSV outputs or combined `nodes.csv`/`edges.csv`; CLI options added: `--username`, `--password`, `--split-by-type`/`--no-split-by-type`.
Documentation / Examples `README.md`	Usage, examples, and output descriptions updated to document authentication, pagination behavior, default per-label/per-type exports, and combined-file mode (`--no-split-by-type`).
Tests `test_main.py`	Removed connection-error test; added tests for auth parameters, default split-by-type behavior, combined-file mode (`split_by_type=False`) validating label/type columns, and a pagination test for larger graph export; tests assert expected CSV files and basic row counts.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Export as export_graph
    participant DB as FalkorDB
    participant CSV as CSV Writer

    rect rgb(245,250,255)
    note over CLI,Export: CLI invokes export with auth & options
    CLI->>Export: graph_name, host, port, username?, password?, split_by_type
    end

    rect rgb(225,245,225)
    note over Export,DB: Batched Node Retrieval (last_id)
    loop while batch returned
        Export->>DB: Query nodes WHERE id > last_id LIMIT batch_size
        DB-->>Export: node batch
        Export->>Export: Accumulate nodes by label / log progress
    end
    end

    rect rgb(255,245,230)
    note over Export,CSV: Conditional Node Output
    alt split_by_type = True
        Export->>CSV: Write per-label node CSVs
    else
        Export->>CSV: Write combined `nodes.csv` (with label column)
    end
    end

    rect rgb(225,245,225)
    note over Export,DB: Batched Edge Retrieval (last_id)
    loop while batch returned
        Export->>DB: Query edges WHERE id > last_id LIMIT batch_size
        DB-->>Export: edge batch
        Export->>Export: Accumulate edges by type / log progress
    end
    end

    rect rgb(255,245,230)
    note over Export,CSV: Conditional Edge Output
    alt split_by_type = True
        Export->>CSV: Write per-type edge CSVs
    else
        Export->>CSV: Write combined `edges.csv` (with type column)
    end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

Inspect pagination loops and last_id handling to ensure no skips/duplicates.
Verify URL construction and safe handling/escaping of username/password.
Confirm CSV headers/naming and consistency across split vs combined modes.
Review added tests for deterministic setup/teardown and correct assertions.

Poem

I hop through batches, nibble rows, 🐇
Labels, types in tidy rows,
URL stitched, credentials set,
I print progress — no regret,
Carrot CSVs for all who chose.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add auth support, split-by-type option, and pagination for large graphs" directly and accurately reflects the three main features being introduced in this changeset. The title identifies authentication support (username/password parameters), the split-by-type option for CSV output organization, and pagination to handle large graphs—all of which are confirmed as primary objectives in the PR summary. The title is concise, clear, and uses specific terminology that communicates the nature of the changes, allowing reviewers to understand the primary intent at a glance without being vague or overly broad.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-auth-pagination-features

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-28T16:28:55Z

Codecov Report

❌ Patch coverage is 89.15663% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.43%. Comparing base (83d98fe) to head (64de99c).

Files with missing lines	Patch %	Lines
main.py	89.15%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
- Coverage   86.53%   85.43%   -1.11%     
==========================================
  Files           1        1              
  Lines          52      103      +51     
==========================================
+ Hits           45       88      +43     
- Misses          7       15       +8

Flag	Coverage Δ
unittests	`85.43% <89.15%> (-1.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
main.py	`85.43% <89.15%> (-1.11%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Document authentication options (--username, --password) - Explain --split-by-type and --no-split-by-type modes - Add comprehensive usage examples - Document pagination support for large graphs - Update output section with file naming patterns

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

main.py (3)

59-75: Memory usage still O(total). Stream to CSV per batch.

For very large graphs, accumulate-then-write is memory heavy. Append per batch with header once.

# per label/type file: write each batch
df.to_csv(filename, mode="a", index=False, header=not os.path.exists(filename))

Optionally precompute column union to stabilize headers.

Also applies to: 115-130

148-158: Simplify boolean flag with BooleanOptionalAction.

You don’t need two flags; argparse can generate --split-by-type/--no-split-by-type automatically.

-    parser.add_argument(
-        "--split-by-type",
-        action="store_true",
-        default=True,
-        help="Create separate CSV files per node label and edge type (default: True)"
-    )
-    parser.add_argument(
-        "--no-split-by-type",
-        dest="split_by_type",
-        action="store_false",
-        help="Create single CSV files for all nodes and edges"
-    )
+    parser.add_argument(
+        "--split-by-type",
+        dest="split_by_type",
+        action=argparse.BooleanOptionalAction,
+        default=True,
+        help="Create separate CSV files per node label and edge type (use --no-split-by-type to combine)"
+    )

Requires Python 3.9+.

59-64: Optional: mitigate CSV injection when opening in Excel.

If properties can start with =,+,-,@, consider prefixing dangerous values before writing.

def sanitize_cell(v):
    if isinstance(v, str) and v and v[0] in "=+-@":
        return "'" + v
    return v
# apply before DataFrame(...)
rows = [{k: sanitize_cell(v) for k, v in row.items()} for row in rows]

Also applies to: 115-121

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83d98fe and 22232b0.

📒 Files selected for processing (1)

main.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.12)

🔇 Additional comments (1)

main.py (1)

7-16: No issues found — FalkorDB auth kwargs are correct.

The FalkorDB Python client accepts host, port, username, and password as constructor kwargs, and the code correctly passes these parameters. No changes needed.

coderabbitai · 2025-10-28T16:34:16Z

main.py

+    while True:
+        nodes_result = g.ro_query(
+            f"MATCH (n) RETURN ID(n), labels(n), properties(n) SKIP {skip} LIMIT {batch_size}"
+        )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Pagination via SKIP/LIMIT scales poorly at high offsets; consider ID-based pagination and parameters.

Large SKIPs become O(N). Prefer WHERE ID(x) > last_id ORDER BY ID(x) LIMIT $limit. Also parameterize skip/limit to avoid string interpolation.

Example pattern:

# Nodes q = "MATCH (n) WHERE ID(n) > $last_id RETURN ID(n), labels(n), properties(n) ORDER BY ID(n) LIMIT $limit" params = {"last_id": last_id, "limit": batch_size} # Repeat updating last_id from final record

Also applies to: 83-86

🌐 Web query:

Does FalkorDB support parameterized queries in Python (ro_query with params)? Recommended pattern for paginating large result sets (ID-based) in FalkorDB/RedisGraph?

💡 Result:

Short answers

Yes — FalkorDB supports parameterized queries and you can pass params from Python. The GRAPH.QUERY docs show the parameterized syntax and a Python example; the read-only command GRAPH.RO_QUERY exists as well, and the client follows the same query/params pattern. [1][2]

Recommended pattern for paginating large result sets (ID / cursor based)

Use cursor (ID or indexed timestamp) pagination rather than OFFSET/LIMIT. Keep an indexed, monotonic property on nodes (e.g. created_at or a numeric id) and paginate with a WHERE clause plus ORDER BY and LIMIT:

Example Cypher (use your node property name instead of my_id):

MATCH (n:MyLabel) WHERE n.my_id > $cursor ORDER BY n.my_id ASC LIMIT $limit RETURN n

Pass cursor and limit as parameters (so queries are cached and safe). On the first page use cursor = 0 (or null-handling) and each page set cursor = last_row.my_id from the previous page.

Benefits: uses an index so the database jumps to the next range (fast, stable under inserts), avoids scanning/skipping rows (OFFSET) and reduces memory/IO pressure.

Caveats:

Ensure the paginating property is indexed and monotonic (or use created_at if strictly increasing).

If you need strict snapshot semantics across many pages, you’ll need additional application-level handling because cursor pagination is eventually consistent with concurrent writes.

Use FalkorDB configuration limits (RESULTSET_SIZE / timeouts) to avoid runaway queries on unexpectedly large requests. [3][4][5]

References

GRAPH.QUERY (parameterized query examples, Python). [1] https://docs.falkordb.com/commands/graph.query.html

GRAPH.RO_QUERY (read-only query command). [2] https://docs.falkordb.com/commands/graph.ro-query.html

FalkorDB configuration (RESULTSET_SIZE, timeouts). [3] https://docs.falkordb.com/getting-started/configuration.html

Pagination best-practices / avoid OFFSET (cursor/id pagination recommended). [4] https://configzen.com/blog/master-pagination-mysql-pitfalls

Cursor-based pagination overview (why it’s preferred for large sets). [5] https://www.carens.dev/a-friendly-guide-to-pagination

If you want, I can show the exact Python call form for graph.ro_query with params using the FalkorDB Python client.

I'll verify the actual code in main.py to ensure the refactoring suggestion is applicable.

I need to inspect the actual code in main.py to verify the current implementation and determine whether the refactoring suggestion is accurate and appropriate.

[/run_scripts]

Once I have the code, I'll verify the exact pagination pattern and determine if the refactoring is applicable and correctly suggested.

🏁 Script executed:

#!/bin/bash # Inspect the code at lines 25-28 and 83-86 in main.py echo "=== Lines 25-28 ===" && sed -n '25,28p' main.py echo "" echo "=== Lines 83-86 ===" && sed -n '83,86p' main.py echo "" echo "=== Context around lines 25-28 (20-35) ===" && sed -n '20,35p' main.py echo "" echo "=== Context around lines 83-86 (75-95) ===" && sed -n '75,95p' main.py

Length of output: 1486

Replace SKIP/LIMIT with parameterized cursor-based pagination using indexed node/edge properties, not internal IDs.

FalkorDB's internal IDs are not immutable—higher IDs may be reassigned to lower positions after deletion, making ID-based cursors unreliable for pagination. Instead:

Use indexed properties (e.g., created_at, uuid, or a stable node_id attribute) for cursor pagination.

Parameterize queries to avoid string interpolation and improve performance.

For nodes (lines 25–28):

q = "MATCH (n) WHERE n.id > $last_id RETURN ID(n), labels(n), properties(n) ORDER BY n.id LIMIT $limit" params = {"last_id": last_id, "limit": batch_size} nodes_result = g.ro_query(q, params)

For edges (lines 83–86):

q = "MATCH (a)-[e]->(b) WHERE e.id > $last_id RETURN ID(e), TYPE(e), ID(a), ID(b), properties(e) ORDER BY e.id LIMIT $limit" params = {"last_id": last_id, "limit": batch_size} edges_result = g.ro_query(q, params)

Ensure the pagination property (n.id, e.id) is indexed and monotonically increasing.

coderabbitai · 2025-10-28T16:34:16Z

main.py

+            labels = record[1]
+            props = record[2] or {}
+
+            # Handle nodes with multiple labels or no labels
+            if labels:
+                for label in labels:
+                    node = {"id": node_id}
+                    node.update(props)
+                    nodes_by_label[label].append(node)
+            else:


⚠️ Potential issue | 🟠 Major

labels(n) type guard to avoid iterating characters.

If labels(n) yields a string, the current loop will iterate per-char. Guard it.

- labels = record[1] + labels = record[1] or [] + if not isinstance(labels, (list, tuple)): + labels = [labels]

🤖 Prompt for AI Agents

In main.py around lines 35 to 44 the code may iterate a string label one character at a time; before the loop ensure labels is treated as a sequence of labels (not a str) — e.g. if labels is a str, wrap it as a single-item list (or otherwise convert into the expected iterable) and then run the existing for label in labels logic; keep the existing None/empty handling intact.

coderabbitai · 2025-10-28T16:34:17Z

main.py

+                    node = {"id": node_id}
+                    node.update(props)
+                    nodes_by_label[label].append(node)
+            else:
+                # Handle nodes without labels
                node = {"id": node_id}
                node.update(props)
-                nodes_by_label[label].append(node)
-        else:
-            # Handle nodes without labels
-            node = {"id": node_id}
-            node.update(props)
-            nodes_by_label["unlabeled"].append(node)
-
-    # Export each node label to its own CSV file
-    for label, nodes in nodes_by_label.items():
-        filename = f"nodes_{label}.csv"
-        pd.DataFrame(nodes).to_csv(filename, index=False)
-        print(f"✅ Exported {len(nodes)} nodes with label '{label}' to {filename}")
+                nodes_by_label["unlabeled"].append(node)


⚠️ Potential issue | 🔴 Critical

Props can overwrite IDs — use non-colliding field names.

props may contain keys like 'id', overwriting your synthetic identifiers and corrupting exports.

Apply:

- node = {"id": node_id} + node = {"node_id": node_id} node.update(props) ... - node = {"id": node_id} + node = {"node_id": node_id} node.update(props) ... - edge = { - "id": edge_id, - "from_id": from_id, - "to_id": to_id - } + edge = { + "edge_id": edge_id, + "from_id": from_id, + "to_id": to_id + } edge.update(props)

Also applies to: 98-105

🤖 Prompt for AI Agents

In main.py around lines 41 to 48 (and likewise at lines 98 to 105), props may contain an "id" key and will overwrite the synthetic node_id when you call node.update(props); fix by building nodes so props cannot collide with the synthetic id—create the node dict with the synthetic id under "id" (or use a reserved key like "_id") and merge props into a separate key or filter out "id" from props before updating (e.g., copy props minus "id" or nest props under "props") so the synthetic identifier is preserved.

coderabbitai · 2025-10-28T16:34:17Z

main.py

+            filename = f"nodes_{label}.csv"
+            pd.DataFrame(nodes).to_csv(filename, index=False)
+            print(f"✅ Exported {len(nodes)} nodes with label '{label}' to {filename}")


🛠️ Refactor suggestion | 🟠 Major

Sanitize filenames to avoid OS/path issues.

Label/type may contain spaces or separators. Sanitize before composing filenames.

- filename = f"nodes_{label}.csv" + filename = f"nodes_{safe_filename(label)}.csv" ... - filename = f"edges_{edge_type}.csv" + filename = f"edges_{safe_filename(edge_type)}.csv"

Add once near imports:

import re def safe_filename(name: str) -> str: name = str(name).strip().replace(" ", "_") name = re.sub(r"[^A-Za-z0-9_.-]+", "_", name) return name[:200] or "unnamed"

Also applies to: 118-120

🤖 Prompt for AI Agents

In main.py around lines 62 to 64 (and also apply to lines 118 to 120), filenames are composed directly from label/type which may contain spaces or unsafe characters; add the provided safe_filename helper near the imports (and import re) and use it when building filenames (e.g., filename = f"nodes_{safe_filename(label)}.csv") so labels are sanitized, truncated to a safe length, and invalid chars replaced.

coderabbitai · 2025-10-28T16:34:17Z

main.py

+        all_nodes = []
+        for label, nodes in nodes_by_label.items():
+            for node in nodes:
+                node["label"] = label
+                all_nodes.append(node)
+        if all_nodes:
+            pd.DataFrame(all_nodes).to_csv("nodes.csv", index=False)
+            print(f"✅ Exported {len(all_nodes)} nodes to nodes.csv")


⚠️ Potential issue | 🟠 Major

Avoid mutating records and avoid 'label'/'type' key collisions.

Use copy with reserved column names to prevent clobbering user props.

- for label, nodes in nodes_by_label.items(): - for node in nodes: - node["label"] = label - all_nodes.append(node) + for label, nodes in nodes_by_label.items(): + for node in nodes: + all_nodes.append({**node, "node_label": label}) ... - for edge_type, edges in edges_by_type.items(): - for edge in edges: - edge["type"] = edge_type - all_edges.append(edge) + for edge_type, edges in edges_by_type.items(): + for edge in edges: + all_edges.append({**edge, "edge_type": edge_type})

Also applies to: 124-129

🤖 Prompt for AI Agents

In main.py around lines 67-74 (and similarly 124-129) you are mutating the original node dicts and potentially clobbering user keys named "label" or "type"; instead, create a shallow copy of each node (e.g., node_copy = node.copy()) and add a reserved column name (use a prefixed name unlikely to collide, e.g., "_label" or "__label" and "_type"/"__type") to the copy before appending; then write the list of copies to CSV so original records remain unmodified and user properties are preserved.

coderabbitai · 2025-10-28T16:34:17Z

main.py

+    parser.add_argument("--username", help="FalkorDB username (optional)")
+    parser.add_argument("--password", help="FalkorDB password (optional)")
+    parser.add_argument(


⚠️ Potential issue | 🟠 Major

Passwords on the command line leak via ps/history — prefer env var or prompt.

Accepting --password is risky. Fallback to env or interactive prompt when username is set.

import os, getpass ... args = parser.parse_args() if args.username and not args.password: args.password = os.getenv("FALKORDB_PASSWORD") or getpass.getpass("FalkorDB password: ")

Document FALKORDB_PASSWORD in README.

Also applies to: 160-163

🤖 Prompt for AI Agents

In main.py around lines 145-147 and 160-163, avoid accepting plaintext passwords via the --password CLI flag; when a username is provided and no password was supplied on the command line, read the password from the FALKORDB_PASSWORD environment variable or prompt interactively using getpass.getpass() instead. Update argument parsing to treat --password as optional (or remove it) and add post-parse logic that if args.username and not args.password then set args.password = os.getenv("FALKORDB_PASSWORD") or prompt the user; ensure imports for os and getpass are added and that interactive prompting only occurs when stdin is a TTY. Also update the README to document the FALKORDB_PASSWORD environment variable and the interactive prompt behavior.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

README.md (5)
97-101: Tighten auth examples to avoid teaching insecure habits.

Great examples. Add a one-line caution under the auth examples to note CLI password exposure; consider showing a safer pattern (e.g., prompt) if the tool supports it.

Suggested tweak:
 **Export with authentication:**
 ```bash
-python main.py social --username myuser --password mypass
+python main.py social --username myuser --password '<enter-password>'
+Note: Passing passwords on the command line can leak via history/ps. Prefer an interactive prompt or environment configuration if supported.
Also applies to: 102-106, 107-111, 112-116, 117-121

---

`125-143`: **Document how unlabeled nodes are handled in split mode.**

If unlabeled nodes exist, specify the filename convention (e.g., nodes_UNLABELED.csv) and whether multiple labels produce one row per node (union of labels) or a label-joined representation.



Proposed addition:

```diff
 #### Node files: `nodes_<label>.csv`
 Each node label gets its own file (e.g., `nodes_Person.csv`, `nodes_Company.csv`):
 - `id`: Node ID
 - Additional columns for each node property
+Notes:
+- Unlabeled nodes are written to `nodes_UNLABELED.csv` (if present).
+- Nodes with multiple labels appear in each corresponding label file.
153-160: Clarify column unification and missing values in combined mode.

State that the schema is the union of properties across labels/types and that missing properties are left blank/NA, so users know how to parse.

Suggested text:
 #### nodes.csv
 All nodes in a single file:
 - `id`: Node ID
 - `label`: Node label/type
 - Additional columns for each node property
+Schema notes: columns are the union of all encountered node properties; properties absent on a node are left empty.
@@
 #### edges.csv  
 All edges in a single file:
 - `id`: Edge ID
 - `type`: Relationship type
 - `from_id`: Source node ID
 - `to_id`: Target node ID
 - Additional columns for each edge property
+Schema notes: columns are the union of all encountered edge properties; properties absent on an edge are left empty.
161-164: Add --batch-size CLI flag for configurable pagination (currently hardcoded to 10,000).

Verified: batch_size is hardcoded to 10,000 in main.py (lines 21, 79) with no CLI or environment variable override. While this matches the README documentation, adding a --batch-size flag would improve usability for users with memory constraints or specific performance requirements.

76-93: Add documentation clarifications for flag exclusivity and password security.

The code correctly implements mutually exclusive flags via dest="split_by_type" with opposite actions (store_true vs. store_false), with default=True. However, the README help text does not clarify this relationship or the default behavior. Additionally, the implementation passes passwords directly from the CLI with no environment variable or interactive input alternative, making the security warning in the proposed diff valid and actionable.

Apply the suggested documentation update to clarify flag behavior and add the security note.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22232b0 and a7005f6.

📒 Files selected for processing (1)

README.md (2 hunks)

🔇 Additional comments (2)

README.md (2)

20-22: Nice additions to call out.

Feature bullets align with the PR scope; no issues.

80-81: Wording LGTM.

“by label/type” matches the two output modes described below.

- Test authentication parameter acceptance - Test split-by-type mode (both True and False) - Test combined mode output (single nodes.csv and edges.csv) - Test default behavior (split mode) - Test pagination with larger graphs (100 nodes)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

test_main.py (3)
247-254: Test doesn't validate authentication functionality.

The test passes None for both username and password, which only verifies that the function accepts these parameters but doesn't test actual authentication behavior. Consider adding test cases that:

Use real credentials when the FalkorDB test instance has authentication enabled

Test authentication failure scenarios with invalid credentials

256-272: Use exact counts for more precise assertions.

The assertions use >= for minimum counts, but since the test_graph fixture creates a deterministic dataset (exactly 3 nodes and 2 edges), the test should assert exact values for more precision.

Apply this diff to strengthen the assertions:
-        assert len(nodes_df) >= 3, "Should have at least 3 nodes (2 Person + 1 Company)"
+        assert len(nodes_df) == 3, "Should have exactly 3 nodes (2 Person + 1 Company)"
         
         # Verify edges.csv has type column
         edges_df = pd.read_csv("edges.csv")
         assert "type" in edges_df.columns, "edges.csv should have type column"
-        assert len(edges_df) >= 2, "Should have at least 2 edges (WORKS_FOR + KNOWS)"
+        assert len(edges_df) == 2, "Should have exactly 2 edges (WORKS_FOR + KNOWS)"
297-298: Replace bare except with specific exceptions.

The bare except clauses on lines 297 and 321 are overly broad and can hide unexpected errors. Additionally, silently ignoring exceptions makes debugging harder.

Apply this diff to catch specific exceptions:
         # Clear any existing data
         try:
             g.query("MATCH (n) DETACH DELETE n")
-        except:
-            pass
+        except Exception as e:
+            # Graph might not exist yet or be empty
+            print(f"Warning: Failed to clear graph: {e}")
         
         # Create 100 nodes (simulating pagination, though batch size is 10000)
And similarly for the cleanup section:
         # Cleanup
         try:
             g.query("MATCH (n) DETACH DELETE n")
-        except:
-            pass
+        except Exception as e:
+            print(f"Warning: Failed to cleanup graph: {e}")
Also applies to: 321-322

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7005f6 and 53f16a3.

📒 Files selected for processing (1)

test_main.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

test_main.py (1)

main.py (1)

export_graph (7-135)

🪛 Ruff (0.14.2)

test_main.py

247-247: Unused method argument: temp_dir

(ARG002)

256-256: Unused method argument: temp_dir

(ARG002)

274-274: Unused method argument: temp_dir

(ARG002)

288-288: Unused method argument: temp_dir

(ARG002)

297-297: Do not use bare except

(E722)

297-298: try-except-pass detected, consider logging the exception

(S110)

321-321: Do not use bare except

(E722)

321-322: try-except-pass detected, consider logging the exception

(S110)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.12)

🔇 Additional comments (1)

test_main.py (1)

274-286: LGTM!

Good test coverage for verifying the default split-by-type behavior, including negative assertions to ensure combined files aren't created.

AviAvni · 2025-10-29T06:58:28Z

main.py

 import pandas as pd

-def export_graph(graph_name, host, port):
+def export_graph(graph_name, host, port, username=None, password=None, split_by_type=True):


use url instead

AviAvni · 2025-10-29T07:00:12Z

main.py

+
+    while True:
+        nodes_result = g.ro_query(
+            f"MATCH (n) RETURN ID(n), labels(n), properties(n) SKIP {skip} LIMIT {batch_size}"


It will be faster if instead of skip filter wit id(n) > last id

id is not necessarily sortable... I think current approach is "good enough"...

this query specifically under the hood is scan the nodes by id
it will be slow when the graph is very large

…nation - Use redis:// URL format for connection instead of separate parameters - Replace SKIP-based pagination with ID-based filtering (WHERE ID(n) > last_id) - Add ORDER BY ID(n)/ID(e) for consistent pagination - Improves performance on large graphs by avoiding expensive SKIP operations

The ID-based pagination approach failed because FalkorDB/Cypher doesn't support ORDER BY on ID() function. Reverting to SKIP-based pagination while keeping the URL connection improvement from AviAvni's review.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (8)

main.py (8)

27-31: Parameterize queries instead of string interpolation.

Avoid f-string SQL/Cypher; improves safety and plan caching.

-        nodes_result = g.ro_query(
-            f"MATCH (n) WHERE ID(n) > {last_id} RETURN ID(n), labels(n), properties(n) ORDER BY ID(n) LIMIT {batch_size}"
-        )
+        nodes_query = (
+            "MATCH (n) WHERE ID(n) > $last_id "
+            "RETURN ID(n), labels(n), properties(n) "
+            "ORDER BY ID(n) LIMIT $limit"
+        )
+        nodes_result = g.ro_query(nodes_query, {"last_id": last_id, "limit": batch_size})

-        edges_result = g.ro_query(
-            f"MATCH (a)-[e]->(b) WHERE ID(e) > {last_id} RETURN ID(e), TYPE(e), ID(a), ID(b), properties(e) ORDER BY ID(e) LIMIT {batch_size}"
-        )
+        edges_query = (
+            "MATCH (a)-[e]->(b) WHERE ID(e) > $last_id "
+            "RETURN ID(e), TYPE(e), ID(a), ID(b), properties(e) "
+            "ORDER BY ID(e) LIMIT $limit"
+        )
+        edges_result = g.ro_query(edges_query, {"last_id": last_id, "limit": batch_size})

Also applies to: 87-91

66-66: Sanitize filenames derived from labels/types.

Avoid spaces/unsafe chars in filenames.

Add helper near imports:

+import re
+
+def safe_filename(name: str) -> str:
+    name = str(name).strip().replace(" ", "_")
+    name = re.sub(r"[^A-Za-z0-9_.-]+", "_", name)
+    return name[:200] or "unnamed"

Use it:

-            filename = f"nodes_{label}.csv"
+            filename = f"nodes_{safe_filename(label)}.csv"

-            filename = f"edges_{edge_type}.csv"
+            filename = f"edges_{safe_filename(edge_type)}.csv"

Also applies to: 124-124

71-75: Don’t mutate records; avoid 'label'/'type' collisions.

Use copies and reserved column names to prevent clobbering user props.

-        all_nodes = []
-        for label, nodes in nodes_by_label.items():
-            for node in nodes:
-                node["label"] = label
-                all_nodes.append(node)
+        all_nodes = []
+        for label, nodes in nodes_by_label.items():
+            for node in nodes:
+                all_nodes.append({**node, "node_label": label})

-        all_edges = []
-        for edge_type, edges in edges_by_type.items():
-            for edge in edges:
-                edge["type"] = edge_type
-                all_edges.append(edge)
+        all_edges = []
+        for edge_type, edges in edges_by_type.items():
+            for edge in edges:
+                all_edges.append({**edge, "edge_type": edge_type})

Also applies to: 129-133

27-61: Cursor pagination over internal IDs can miss/duplicate rows; use a monotonic, indexed property.

Internal IDs are not guaranteed stable across deletes; prefer WHERE n.my_cursor > $cursor ORDER BY n.my_cursor LIMIT $limit (same for edges).

If you have a property like created_at or uid, switch to it:

# Example (nodes)
q = ("MATCH (n) WHERE EXISTS(n.uid) AND n.uid > $cursor "
     "RETURN ID(n), labels(n), properties(n) ORDER BY n.uid LIMIT $limit")
params = {"cursor": last_uid, "limit": batch_size}

Confirm which properties are suitable and indexed.

Also applies to: 87-119

38-45: Guard labels type to avoid iterating characters.

labels can be a string; iterating would split into chars.

-            labels = record[1]
+            labels = record[1] or []
+            if not isinstance(labels, (list, tuple)):
+                labels = [labels]

44-51: Protect synthetic IDs from collision with props.

props may contain 'id'; node.update(props) will overwrite it.

-                    node = {"id": node_id}
-                    node.update(props)
+                    node = {"node_id": node_id}
+                    node.update({k: v for k, v in props.items() if k != "node_id"})
...
-                node = {"id": node_id}
-                node.update(props)
-                nodes_by_label["unlabeled"].append(node)
+                node = {"node_id": node_id}
+                node.update({k: v for k, v in props.items() if k != "node_id"})
+                nodes_by_label["__unlabeled__"].append(node)

103-109: Edge props can overwrite IDs; use non‑colliding keys.

Same collision risk as nodes.

-            edge = {
-                "id": edge_id,
-                "from_id": from_id,
-                "to_id": to_id
-            }
-            edge.update(props)
+            edge = {
+                "edge_id": edge_id,
+                "from_id": from_id,
+                "to_id": to_id
+            }
+            edge.update({k: v for k, v in props.items() if k not in ("edge_id", "from_id", "to_id")})

151-163: Avoid plaintext --password; support env var or interactive prompt.

CLI args leak via ps/history. Prefer FALKORDB_PASSWORD or getpass when username is set.

     args = parser.parse_args()
 
-    export_graph(args.graph_name, args.host, args.port, args.username, args.password, args.split_by_type)
+    import os, getpass, sys
+    if args.username and not args.password:
+        args.password = os.getenv("FALKORDB_PASSWORD")
+        if not args.password and sys.stdin.isatty():
+            args.password = getpass.getpass("FalkorDB password: ")
+    export_graph(args.graph_name, args.host, args.port, args.username, args.password, args.split_by_type)

Also document FALKORDB_PASSWORD in README.

Also applies to: 166-168

🧹 Nitpick comments (2)

main.py (2)

62-79: Memory scale: stream to CSV instead of accumulating whole graph.

Very large graphs will blow memory. Write per batch (append mode) and clear buffers.

I can provide a targeted patch that flushes each batch to CSV with header=not exists and mode="a" per label/type, keeping memory bounded.

Also applies to: 120-136

154-164: Simplify flags with BooleanOptionalAction; consider exposing batch size.

Cleaner UX and configurability.

Replace the paired flags with:

parser.add_argument("--split-by-type", action=argparse.BooleanOptionalAction, default=True)

Optionally add:

parser.add_argument("--batch-size", type=int, default=10000)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53f16a3 and bcf6139.

📒 Files selected for processing (1)

main.py (2 hunks)

coderabbitai · 2025-10-29T07:16:12Z

main.py

+    if username and password:
+        url = f"redis://{username}:{password}@{host}:{port}"
+    elif username:
+        url = f"redis://{username}@{host}:{port}"
+    else:
+        url = f"redis://{host}:{port}"


⚠️ Potential issue | 🟠 Major

Percent‑encode credentials in URL (special chars break the URI).

Usernames/passwords containing @/:?#& etc. will yield invalid URLs.

Apply:

- if username and password: - url = f"redis://{username}:{password}@{host}:{port}" - elif username: - url = f"redis://{username}@{host}:{port}" + if username and password: + u = quote(username, safe="") + p = quote(password, safe="") + url = f"redis://{u}:{p}@{host}:{port}" + elif username: + u = quote(username, safe="") + url = f"redis://{u}@{host}:{port}" else: url = f"redis://{host}:{port}"

Add import once:

-from collections import defaultdict +from collections import defaultdict +from urllib.parse import quote

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if username and password:

url = f"redis://{username}:{password}@{host}:{port}"

elif username:

url = f"redis://{username}@{host}:{port}"

else:

url = f"redis://{host}:{port}"

if username and password:

u = quote(username, safe="")

p = quote(password, safe="")

url = f"redis://{u}:{p}@{host}:{port}"

elif username:

u = quote(username, safe="")

url = f"redis://{u}@{host}:{port}"

else:

url = f"redis://{host}:{port}"

🤖 Prompt for AI Agents

In main.py around lines 10-15, the code builds a redis URL with raw username/password which breaks when credentials contain reserved chars; import urllib.parse.quote_plus once at the top and use it to percent-encode username and password (e.g. encoded_user = quote_plus(username, safe='') and encoded_pass = quote_plus(password, safe='')) before inserting into the f-strings, keeping the same conditional branches (use encoded_user and encoded_pass where appropriate) so the generated redis:// URL is always valid.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

main.py (3)
7-17: Consider adding connection error handling.

The function directly connects to FalkorDB without handling potential connection failures (invalid credentials, unreachable host, etc.). Adding a try-except block would provide better user feedback.

Example:
try:
    db = FalkorDB(url=url)
    g = db.select_graph(graph_name)
except Exception as e:
    print(f"❌ Failed to connect to FalkorDB: {e}")
    raise
23-23: Consider extracting batch size as a module constant.

The batch size 10000 is hardcoded in both node and edge pagination loops (lines 23 and 81). Defining it once as a module-level constant would improve maintainability.
+BATCH_SIZE = 10000
+
 def export_graph(graph_name, host, port, username=None, password=None, split_by_type=True):
     ...
-    batch_size = 10000
+    batch_size = BATCH_SIZE
149-154: Redundant default in --split-by-type argument.

The default=True is unnecessary when using action="store_true" because argparse already defaults unset boolean flags to False, and the --no-split-by-type flag handles the False case.
     parser.add_argument(
         "--split-by-type",
         action="store_true",
-        default=True,
         help="Create separate CSV files per node label and edge type (default: True)"
     )
Note: The current implementation works correctly; this is just a minor cleanup.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcf6139 and 991a69b.

📒 Files selected for processing (1)

main.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.12)

🔇 Additional comments (1)

main.py (1)

164-164: LGTM!

The function call correctly passes all new parameters including authentication credentials and the split_by_type flag.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test_main.py (1)

289-323: Pagination test still doesn't validate multi-batch retrieval.

This test creates only 100 nodes, but the batch size in main.py is 10,000, so pagination logic (multiple SKIP/LIMIT iterations) is never exercised. The previous review flagged this and was marked as addressed in commit bcf6139, but the test still doesn't validate multi-batch pagination.

🧹 Nitpick comments (1)

test_main.py (1)
298-299: Use specific exception type or log the error.

The bare except clause should specify the expected exception type (e.g., except Exception as e:) or log the error for debugging. This applies to both cleanup blocks in this test.

Apply this diff to improve error handling:
         # Clear any existing data
         try:
             g.query("MATCH (n) DETACH DELETE n")
-        except:
-            pass
+        except Exception:
+            pass  # Graph might not exist yet
And similarly for the cleanup block at lines 320-323:
         # Cleanup
         try:
             g.query("MATCH (n) DETACH DELETE n")
-        except:
-            pass
+        except Exception:
+            pass  # Ignore cleanup errors

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 991a69b and 1f7cded.

📒 Files selected for processing (1)

test_main.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

test_main.py (1)

main.py (1)

export_graph (7-137)

🪛 Ruff (0.14.2)

test_main.py

241-241: Unused method argument: temp_dir

(ARG002)

245-245: Do not assert blind exception: Exception

(B017)

248-248: Unused method argument: temp_dir

(ARG002)

257-257: Unused method argument: temp_dir

(ARG002)

275-275: Unused method argument: temp_dir

(ARG002)

289-289: Unused method argument: temp_dir

(ARG002)

298-298: Do not use bare except

(E722)

298-299: try-except-pass detected, consider logging the exception

(S110)

322-322: Do not use bare except

(E722)

322-323: try-except-pass detected, consider logging the exception

(S110)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.12)

🔇 Additional comments (3)

test_main.py (3)

241-246: LGTM! Connection error test improved.

The change to test localhost:1 is more reliable than the previous approach. The static analysis warnings about unused temp_dir and blind exception are false positives—temp_dir is used by pytest for its side effect (directory change and cleanup), and the blind exception assert is acceptable for verifying connection failures.

257-273: LGTM! Comprehensive combined mode test.

This test properly validates the split_by_type=False behavior by checking for combined files, required columns (label and type), and minimum data counts. Good coverage.

275-287: LGTM! Good default behavior validation.

This test correctly validates that split_by_type=True is the default by verifying separate files are created and combined files are not. The negative assertions are particularly valuable.

coderabbitai · 2025-10-29T07:52:40Z

test_main.py

+    def test_export_with_auth_params(self, test_graph, temp_dir):
+        """Test that export_graph accepts username and password parameters."""
+        # Test that function accepts auth parameters (even if server doesn't require them)
+        export_graph(test_graph, "localhost", 6379, username=None, password=None)
+
+        # Should create the expected files
+        assert os.path.exists("nodes_Person.csv"), "nodes_Person.csv should be created"
+        assert os.path.exists("edges_WORKS_FOR.csv"), "edges_WORKS_FOR.csv should be created"


⚠️ Potential issue | 🟡 Minor

Test with actual username/password values.

This test passes username=None, password=None, which only verifies the function signature accepts auth parameters but doesn't test that credentials are actually used. Consider adding a test with real values (e.g., username="test_user", password="test_pass") to verify the authentication URL construction works correctly, even if the test server doesn't enforce auth.

Apply this diff to strengthen the test:

def test_export_with_auth_params(self, test_graph, temp_dir): - """Test that export_graph accepts username and password parameters.""" - # Test that function accepts auth parameters (even if server doesn't require them) - export_graph(test_graph, "localhost", 6379, username=None, password=None) + """Test that export_graph accepts and handles username and password parameters.""" + # Test with actual auth parameters (even if server doesn't require them) + # This verifies the URL construction logic handles credentials correctly + export_graph(test_graph, "localhost", 6379, username="test_user", password="test_pass") # Should create the expected files assert os.path.exists("nodes_Person.csv"), "nodes_Person.csv should be created" assert os.path.exists("edges_WORKS_FOR.csv"), "edges_WORKS_FOR.csv should be created"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_export_with_auth_params(self, test_graph, temp_dir):

"""Test that export_graph accepts username and password parameters."""

# Test that function accepts auth parameters (even if server doesn't require them)

export_graph(test_graph, "localhost", 6379, username=None, password=None)

# Should create the expected files

assert os.path.exists("nodes_Person.csv"), "nodes_Person.csv should be created"

assert os.path.exists("edges_WORKS_FOR.csv"), "edges_WORKS_FOR.csv should be created"

def test_export_with_auth_params(self, test_graph, temp_dir):

"""Test that export_graph accepts and handles username and password parameters."""

# Test with actual auth parameters (even if server doesn't require them)

# This verifies the URL construction logic handles credentials correctly

export_graph(test_graph, "localhost", 6379, username="test_user", password="test_pass")

# Should create the expected files

assert os.path.exists("nodes_Person.csv"), "nodes_Person.csv should be created"

assert os.path.exists("edges_WORKS_FOR.csv"), "edges_WORKS_FOR.csv should be created"

🧰 Tools

🪛 Ruff (0.14.2)

248-248: Unused method argument: temp_dir

(ARG002)

🤖 Prompt for AI Agents

In test_main.py around lines 248 to 255, the test only passes username=None and password=None which doesn't verify that credentials are actually used; update the test to also call export_graph with concrete credentials (e.g., username="test_user", password="test_pass"), run the export, and keep the existing assertions that the expected files (nodes_Person.csv and edges_WORKS_FOR.csv) are created; ensure you do this as a separate call within the same test (or a new test) so the function's authentication parameters are exercised without removing the original None-based check.

…eed to raise

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f7cded and 64de99c.

📒 Files selected for processing (1)

test_main.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

test_main.py (1)

main.py (1)

export_graph (7-137)

🪛 Ruff (0.14.2)

test_main.py

241-241: Unused method argument: temp_dir

(ARG002)

250-250: Unused method argument: temp_dir

(ARG002)

268-268: Unused method argument: temp_dir

(ARG002)

282-282: Unused method argument: temp_dir

(ARG002)

291-291: Do not use bare except

(E722)

291-292: try-except-pass detected, consider logging the exception

(S110)

315-315: Do not use bare except

(E722)

315-316: try-except-pass detected, consider logging the exception

(S110)

🔇 Additional comments (3)

test_main.py (3)

250-266: LGTM!

This test properly validates the combined mode functionality by verifying both the presence of combined CSV files and the expected structure with label/type columns.

268-280: LGTM!

This test correctly validates that split_by_type=True is the default behavior by checking both positive cases (separate files exist) and negative cases (combined files don't exist).

241-248: ****

This review comment is incorrect. The suggestion to add test calls with actual credentials (username="test_user", password="test_pass") would fail against the real test FalkorDB instance, which has no authentication enabled. The tests use real FalkorDB connections (as shown by the falkordb_connection fixture connecting to localhost:6379), not mocks.

The current test is appropriate for this integration test environment: it verifies the function accepts the credential parameters and successfully creates output files with username=None, password=None, which correctly matches the test server's configuration. The export_graph function already contains proper credential URL construction logic (main.py lines 10-15) that handles all three cases (both credentials, username-only, and no credentials).

To test credential handling with authentication would require either: (1) mocking the FalkorDB connection, (2) configuring test infrastructure with auth enabled, or (3) unit testing the URL construction logic directly. None of these changes are necessary given the current integration test approach.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-10-29T08:04:13Z

test_main.py

+    def test_export_pagination_large_graph(self, falkordb_connection, temp_dir):
+        """Test that pagination works with larger datasets."""
+        # Create a graph with more nodes than would fit in a single batch
+        large_graph_name = "large_test_graph"
+        g = falkordb_connection.select_graph(large_graph_name)
+
+        # Clear any existing data
+        try:
+            g.query("MATCH (n) DETACH DELETE n")
+        except:
+            pass
+
+        # Create 100 nodes (simulating pagination, though batch size is 10000)
+        g.query("UNWIND range(1, 100) AS i CREATE (:TestNode {id: i, name: 'Node' + toString(i)})")
+
+        # Create some edges
+        g.query("MATCH (a:TestNode), (b:TestNode) WHERE a.id < b.id AND b.id - a.id = 1 CREATE (a)-[:NEXT]->(b)")
+
+        export_graph(large_graph_name, "localhost", 6379)
+
+        # Verify all nodes were exported
+        assert os.path.exists("nodes_TestNode.csv"), "nodes_TestNode.csv should be created"
+        nodes_df = pd.read_csv("nodes_TestNode.csv")
+        assert len(nodes_df) == 100, f"Should have 100 nodes, got {len(nodes_df)}"
+
+        # Verify edges were exported
+        assert os.path.exists("edges_NEXT.csv"), "edges_NEXT.csv should be created"
+        edges_df = pd.read_csv("edges_NEXT.csv")
+        assert len(edges_df) == 99, f"Should have 99 edges, got {len(edges_df)}"
+
+        # Cleanup
+        try:
+            g.query("MATCH (n) DETACH DELETE n")
+        except:
+            pass


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Test doesn't validate multi-batch pagination.

Although marked as addressed in a previous review, this test still creates only 100 nodes while batch_size in main.py is 10,000. This means pagination logic (multiple iterations with SKIP/LIMIT) is never exercised—all data is fetched in a single batch.

Consider one of these approaches:

Mock the batch_size to a smaller value (e.g., 10) for testing

Rename the test to test_export_moderate_dataset to reflect what it actually validates

Create a separate pagination-specific test with a mocked batch size

Option 1: Mock batch_size to properly test pagination

+ @pytest.fixture(autouse=False) + def mock_batch_size(self, monkeypatch): + """Mock batch size for pagination testing.""" + import main + # Store original batch_size if it becomes a module-level constant + # For now, we'll need to patch it within the function + yield + def test_export_pagination_large_graph(self, falkordb_connection, temp_dir): """Test that pagination works with larger datasets.""" + # Patch batch_size to test pagination with smaller dataset + import main + original_code = main.export_graph.__code__ + # Note: Proper mocking would require refactoring main.py to make batch_size configurable + # Create a graph with more nodes than would fit in a single batch large_graph_name = "large_test_graph"

Better approach: Refactor main.py to accept batch_size as a parameter, then test with a small value like 10.

Looking at the review comment, I need to verify whether the test actually fails to exercise pagination logic. Let me gather information about the batch_size and export_graph implementation.

🏁 Script executed:

#!/bin/bash # Find and examine main.py for batch_size and export_graph implementation echo "=== Searching for batch_size definition ===" rg -n "batch_size" --type=py -C 3 echo -e "\n=== Finding export_graph function ===" ast-grep --pattern $'def export_graph($$$) { $$$ }' echo -e "\n=== Alternative: search for export_graph function definition ===" rg -n "def export_graph" --type=py -A 30

Length of output: 3254

🏁 Script executed:

#!/bin/bash # Examine main.py for batch_size definition and export_graph implementation echo "=== Searching for batch_size ===" rg -n "batch_size" --type=py -C 2 echo -e "\n=== Searching for export_graph function ===" rg -n "def export_graph" --type=py -A 40

Length of output: 3261

Test does not exercise multi-batch pagination logic.

The review is correct: batch_size is hardcoded to 10,000 in main.py (lines 23, 81), and the test creates only 100 nodes. This means the pagination while loop executes once and exits (100 < 10,000), never testing the SKIP/LIMIT iteration behavior.

The proposed diff in the review is incomplete. Since batch_size is a hardcoded constant inside export_graph, it cannot be easily mocked without refactoring.

Recommended fixes:

Refactor export_graph to accept batch_size as a parameter with a sensible default (10,000), then pass a small value (e.g., 10) in this test.

Rename the test to test_export_moderate_dataset to reflect that it validates correctness on a subset of data, not pagination behavior.

Create a separate test_export_pagination test only after refactoring to support configurable batch size.

🧰 Tools

🪛 Ruff (0.14.2)

282-282: Unused method argument: temp_dir

(ARG002)

291-291: Do not use bare except

(E722)

291-292: try-except-pass detected, consider logging the exception

(S110)

315-315: Do not use bare except

(E722)

315-316: try-except-pass detected, consider logging the exception

(S110)

shahar-biron requested a review from gkorland October 28, 2025 16:27

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

shahar-biron requested review from AviAvni and barakb October 29, 2025 06:40

AviAvni requested changes Oct 29, 2025

View reviewed changes

shahar-biron added 2 commits October 29, 2025 09:10

Revert to SKIP pagination - FalkorDB doesn't support ORDER BY ID()

991a69b

The ID-based pagination approach failed because FalkorDB/Cypher doesn't support ORDER BY on ID() function. Reverting to SKIP-based pagination while keeping the URL connection improvement from AviAvni's review.

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

Fix test_export_connection_error to use reliable connection failure

1f7cded

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

Remove test_export_connection_error - connection behavior not guarant…

64de99c

…eed to raise

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

shahar-biron requested a review from AviAvni October 29, 2025 09:40

Add auth support, split-by-type option, and pagination for large graphs #26

Are you sure you want to change the base?

Add auth support, split-by-type option, and pagination for large graphs #26

Uh oh!

Conversation

shahar-biron commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Problem Solved

New Features

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

AviAvni Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

AviAvni Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

shahar-biron Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

AviAvni Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

shahar-biron commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

codecov bot commented Oct 28, 2025 •

edited

Loading