-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
# spark is from the previous example
sc = spark.sparkContext
# A text dataset is pointed to by path.
# The path can be either a single text file or a directory of text files
path = "examples/src/main/resources/people.txt"
df1 = spark.read.text(path)
df1.show()
# +-----------+
# | value|
# +-----------+
# |Michael, 29|
# | Andy, 30|
# | Justin, 19|
# +-----------+
# You can use 'lineSep' option to define the line separator.
# The line separator handles all `\r`, `\r\n` and `\n` by default.
df2 = spark.read.text(path, lineSep=",")
df2.show()
# +-----------+
# | value|
# +-----------+
# | Michael|
# | 29\nAndy|
# | 30\nJustin|
# | 19\n|
# +-----------+
# You can also use 'wholetext' option to read each input file as a single row.
df3 = spark.read.text(path, wholetext=True)
df3.show()
# +--------------------+
# | value|
# +--------------------+
# |Michael, 29\nAndy...|
# +--------------------+
# "output" is a folder which contains multiple text files and a _SUCCESS file.
df1.write.csv("output")
# You can specify the compression format using the 'compression' option.
df1.write.text("output_compressed", compression="gzip")
Metadata
Metadata
Assignees
Labels
No labels