pyspark.sql.streaming.DataStreamReader.orc

DataStreamReader.orc(path, mergeSchema=None, pathGlobFilter=None, recursiveFileLookup=None)[source]

Loads a ORC file stream, returning the result as a DataFrame.

New in version 2.3.0.

Parameters
mergeSchemastr or bool, optional

sets whether we should merge schemas collected from all ORC part-files. This will override spark.sql.orc.mergeSchema. The default value is specified in spark.sql.orc.mergeSchema.

pathGlobFilterstr or bool, optional

an optional glob pattern to only include files with paths matching the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. It does not change the behavior of partition discovery.

recursiveFileLookupstr or bool, optional

recursively scan a directory for files. Using this option disables partition discovery. # noqa

Examples

>>> orc_sdf = spark.readStream.schema(sdf_schema).orc(tempfile.mkdtemp())
>>> orc_sdf.isStreaming
True
>>> orc_sdf.schema == sdf_schema
True