-
Notifications
You must be signed in to change notification settings - Fork 9
Add support for Java #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #50 Add support for Java code analysis using tree-sitter. * Add `api/analyzers/java/analyzer.py` to implement `JavaAnalyzer` class for parsing Java code and extracting method and class details. * Modify `api/analyzers/source_analyzer.py` to import `JavaAnalyzer` and add `.java` to the list of supported analyzers. * Add `tree-sitter-java` dependency to `pyproject.toml`. * Modify `api/__init__.py` to import `JavaAnalyzer`. * Modify `api/analyzers/__init__.py` to import `JavaAnalyzer`. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/FalkorDB/code-graph-backend/issues/50?shareId=XXXX-XXXX-XXXX-XXXX).
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughThe pull request introduces Java language support to the code analysis system. A new Changes
Sequence DiagramsequenceDiagram
participant SA as SourceAnalyzer
participant JA as JavaAnalyzer
participant G as Graph
SA->>JA: first_pass(path, file, graph)
JA->>JA: Parse Java source file
JA->>G: Add class entities
JA->>G: Add method entities
SA->>JA: second_pass(path, file, graph)
JA->>JA: Identify method calls
JA->>G: Connect method relationships
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
api/analyzers/java/analyzer.py (3)
1-3: Replace star imports with explicit imports.
Usingfrom ..utils import *andfrom ...entities import *can unintentionally pollute the namespace, causing potential naming conflicts and making it harder to track dependencies.-from ..utils import * -from ...entities import * +from ..utils import find_child_of_type # or any specific methods +from ...entities import Function, Class, File # etc.🧰 Tools
🪛 Ruff (0.8.2)
3-3:
from ..utils import *used; unable to detect undefined names(F403)
79-109: Include docstrings or comments for class extraction logic.
Theprocess_class_declarationmethod works as intended, but adding more descriptive docstrings or inline comments explaining how class modifiers (e.g.,public,abstract) might be handled in future versions can improve maintainability.🧰 Tools
🪛 Ruff (0.8.2)
79-79:
Classmay be undefined, or defined from star imports(F405)
92-92:
find_child_of_typemay be undefined, or defined from star imports(F405)
106-106:
Classmay be undefined, or defined from star imports(F405)
164-223: Add defensive checks for node traversal in second_pass.
When traversing the AST (caller.parent.parent), unexpected structures can causeAttributeErrororNoneType. Consider validating intermediate nodes to avoid runtime errors.- method_calls = query_call_exp.captures(caller.parent.parent) + caller_parent = caller.parent + if caller_parent is None or caller_parent.parent is None: + continue + method_calls = query_call_exp.captures(caller_parent.parent)🧰 Tools
🪛 Ruff (0.8.2)
219-219:
Functionmay be undefined, or defined from star imports(F405)
api/analyzers/__init__.py (1)
2-2: Consider addingJavaAnalyzerto__all__or referencing it within the module.
If the analyzer is imported here primarily for others to use, listing it in__all__clarifies that it’s part of the public API. If it’s not used in this module, the import could be flagged as unused.__all__ = [ "SourceAnalyzer", + "JavaAnalyzer", ]🧰 Tools
🪛 Ruff (0.8.2)
2-2:
.java.analyzer.JavaAnalyzerimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/analyzers/__init__.py(1 hunks)api/analyzers/java/analyzer.py(1 hunks)api/analyzers/source_analyzer.py(2 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/analyzers/__init__.py
2-2: .java.analyzer.JavaAnalyzer imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
api/analyzers/java/analyzer.py
3-3: from ..utils import * used; unable to detect undefined names
(F403)
5-5: from ...entities import * used; unable to detect undefined names
(F403)
22-22: Function may be undefined, or defined from star imports
(F405)
35-35: find_child_of_type may be undefined, or defined from star imports
(F405)
44-44: find_child_of_type may be undefined, or defined from star imports
(F405)
52-52: find_child_of_type may be undefined, or defined from star imports
(F405)
59-59: find_child_of_type may be undefined, or defined from star imports
(F405)
60-60: find_child_of_type may be undefined, or defined from star imports
(F405)
70-70: Function may be undefined, or defined from star imports
(F405)
79-79: Class may be undefined, or defined from star imports
(F405)
92-92: find_child_of_type may be undefined, or defined from star imports
(F405)
106-106: Class may be undefined, or defined from star imports
(F405)
130-130: File may be undefined, or defined from star imports
(F405)
219-219: Function may be undefined, or defined from star imports
(F405)
🔇 Additional comments (3)
api/analyzers/java/analyzer.py (1)
110-163: Check query capture structure.
Your query usage checks'class' in capturesand'method' in captures. Ensure thequery.captures()method returns a dictionary with those keys; some Tree-sitter APIs return lists of(node, capture_name)pairs instead. If so, consider a different approach to iterating over captures.-captures = query.captures(tree.root_node) -if 'class' in captures: - classes = captures['class'] +captures_list = query.captures(tree.root_node) +for (node, capture_name) in captures_list: + if capture_name == 'class': + # process node🧰 Tools
🪛 Ruff (0.8.2)
130-130:
Filemay be undefined, or defined from star imports(F405)
api/analyzers/source_analyzer.py (1)
11-11: Java analyzer addition looks good.
The.javaextension is now mapped toJavaAnalyzer. This integration is consistent with existing analyzers, ensuring that Java files are processed in the same pipeline. Good job!Also applies to: 20-21
pyproject.toml (1)
16-16: Dependency addition is consistent.
tree-sitter-javais correctly added to support the Java analyzer. Ensure the version^0.23.2remains compatible with your other tree-sitter dependencies.✅ Verification successful
Tree-sitter dependencies are consistently versioned
The verification confirms that all tree-sitter dependencies are using compatible versions:
tree-sitter: ^0.23.2tree-sitter-c: ^0.23.2tree-sitter-java: ^0.23.2tree-sitter-python: ^0.23.6 (minor version bump, still compatible)The newly added
tree-sitter-javadependency aligns perfectly with the coretree-sitterversion and other language parsers, maintaining version consistency across the tree-sitter ecosystem.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check installed versions for consistent tree-sitter libraries grep "tree-sitter" pyproject.tomlLength of output: 143
| def process_method_declaration(self, node: Node, path: Path, source_code: str) -> Optional[Function]: | ||
| """ | ||
| Processes a method declaration node to extract method details. | ||
|
|
||
| Args: | ||
| node (Node): The AST node representing a method declaration. | ||
| path (Path): The file path where the method is defined. | ||
|
|
||
| Returns: | ||
| Optional[Function]: A Function object containing details about the method, or None if the method name cannot be determined. | ||
| """ | ||
|
|
||
| # Extract method name | ||
| res = find_child_of_type(node, 'identifier') | ||
| if res is None: | ||
| return None | ||
|
|
||
| identifier = res[0] | ||
| method_name = identifier.text.decode('utf-8') | ||
| logger.info(f"Method declaration: {method_name}") | ||
|
|
||
| # Extract method return type | ||
| res = find_child_of_type(node, 'type') | ||
| ret_type = 'Unknown' | ||
| if res is not None: | ||
| ret_type = res[0] | ||
| ret_type = ret_type.text.decode('utf-8') | ||
|
|
||
| # Extract method parameters | ||
| args = [] | ||
| res = find_child_of_type(node, 'formal_parameters') | ||
| if res is not None: | ||
| parameters = res[0] | ||
|
|
||
| # Extract arguments and their types | ||
| for child in parameters.children: | ||
| if child.type == 'formal_parameter': | ||
| arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8') | ||
| arg_name = find_child_of_type(child, 'identifier')[0].text.decode('utf-8') | ||
| args.append((arg_name, arg_type)) | ||
|
|
||
| # Extract method definition line numbers | ||
| start_line = node.start_point[0] | ||
| end_line = node.end_point[0] | ||
|
|
||
| # Create Function object | ||
| docs = '' | ||
| src = source_code[node.start_byte:node.end_byte] | ||
| f = Function(str(path), method_name, docs, ret_type, src, start_line, end_line) | ||
|
|
||
| # Add arguments to Function object | ||
| for arg in args: | ||
| name, type_ = arg | ||
| f.add_argument(name, type_) | ||
|
|
||
| return f | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
The review comment is correct - direct indexing of find_child_of_type results is unsafe
The implementation of find_child_of_type in utils.py returns either None or a tuple of (Node, int). The code in analyzer.py correctly checks for None in most places, but in the parameter processing section it directly indexes the result with [0] without checking:
arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8')
arg_name = find_child_of_type(child, 'identifier')[0].text.decode('utf-8')This could raise an IndexError if find_child_of_type returns None. The suggested fix to check the result before indexing and provide a fallback value is the correct approach.
🔗 Analysis chain
Validate the existence of method type and parameter identifiers.
Accessing [0] from calls like find_child_of_type(child, 'type')[0] may cause an index error if the list is empty. Consider checking for empty results before indexing.
-arg_type = find_child_of_type(child, 'type')[0].text.decode('utf-8')
+arg_type_nodes = find_child_of_type(child, 'type')
+if arg_type_nodes:
+ arg_type = arg_type_nodes[0].text.decode('utf-8')
+else:
+ arg_type = 'Unknown'Let me verify the concern by checking the implementation of find_child_of_type and its usage in the codebase.
Let me check the implementation of find_child_of_type to understand its return type.
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Find the implementation of find_child_of_type function
ast-grep --pattern 'def find_child_of_type'
# Search for other usages of find_child_of_type to understand the pattern
rg "find_child_of_type" -A 2
Length of output: 2948
Script:
#!/bin/bash
# Get the content of utils.py to see the implementation
cat api/analyzers/utils.py
Length of output: 674
🧰 Tools
🪛 Ruff (0.8.2)
22-22: Function may be undefined, or defined from star imports
(F405)
35-35: find_child_of_type may be undefined, or defined from star imports
(F405)
44-44: find_child_of_type may be undefined, or defined from star imports
(F405)
52-52: find_child_of_type may be undefined, or defined from star imports
(F405)
59-59: find_child_of_type may be undefined, or defined from star imports
(F405)
60-60: find_child_of_type may be undefined, or defined from star imports
(F405)
70-70: Function may be undefined, or defined from star imports
(F405)
Fixes #50
Add support for Java code analysis using tree-sitter.
api/analyzers/java/analyzer.pyto implementJavaAnalyzerclass for parsing Java code and extracting method and class details.api/analyzers/source_analyzer.pyto importJavaAnalyzerand add.javato the list of supported analyzers.tree-sitter-javadependency topyproject.toml.api/__init__.pyto importJavaAnalyzer.api/analyzers/__init__.pyto importJavaAnalyzer.For more details, open the Copilot Workspace session.
Summary by CodeRabbit
New Features
Dependencies
tree-sitter-javalibrary to project dependenciesImprovements