pyspark.sql.SparkSession.readStream

property SparkSession.readStream

Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.

New in version 2.0.0.

Returns
DataStreamReader

Notes

This API is evolving.

Examples

>>> spark.readStream
<pyspark.sql.streaming.readwriter.DataStreamReader object ...>

The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then write the stream out to the console. The streaming query stops in 3 seconds.

>>> import time
>>> df = spark.readStream.format("rate").load()
>>> df = df.selectExpr("value % 3 as v")
>>> q = df.writeStream.format("console").start()
>>> time.sleep(3)
>>> q.stop()