graphframes allows for the id of a node, and the src or dst of an edge to be of type string.
However, the power Iteration Clustering wrapper internally assumes that all of these will be int or bigint, causing a crash.
To Reproduce
Steps to reproduce the behavior:
# Create a Vertex DataFrame with unique ID column "id"
v = spark.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = spark.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
from graphframes import *
g = GraphFrame(v, e)
g.powerIterationClustering(k=2, maxIter=1)
IllegalArgumentException: requirement failed: Column src must be of type equal to one of the following types: [int, bigint] but was actually of type string.