Skip to content

bug: powerIterationClustering failes if src or dst columns are not Int #757

@metanoid

Description

@metanoid

graphframes allows for the id of a node, and the src or dst of an edge to be of type string.
However, the power Iteration Clustering wrapper internally assumes that all of these will be int or bigint, causing a crash.

To Reproduce

Steps to reproduce the behavior:

# Create a Vertex DataFrame with unique ID column "id"
v = spark.createDataFrame([
    ("a", "Alice", 34),
    ("b", "Bob", 36),
    ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = spark.createDataFrame([
    ("a", "b", "friend"),
    ("b", "c", "follow"),
    ("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
from graphframes import *

g = GraphFrame(v, e)
g.powerIterationClustering(k=2, maxIter=1)

IllegalArgumentException: requirement failed: Column src must be of type equal to one of the following types: [int, bigint] but was actually of type string.

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions