pyspark.sql.functions.schema_of_csv#
- pyspark.sql.functions.schema_of_csv(csv, options=None)[source]#
CSV Function: Parses a CSV string and infers its schema in DDL format.
New in version 3.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- csv
Column
or str A CSV string or a foldable string column containing a CSV string.
- optionsdict, optional
Options to control parsing. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.
- csv
- Returns
Column
A string representation of a
StructType
parsed from the given CSV.
Examples
Example 1: Inferring the schema of a CSV string with different data types
>>> from pyspark.sql import functions as sf >>> df = spark.range(1) >>> df.select(sf.schema_of_csv(sf.lit('1|a|true'), {'sep':'|'})).show(truncate=False) +-------------------------------------------+ |schema_of_csv(1|a|true) | +-------------------------------------------+ |STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>| +-------------------------------------------+
Example 2: Inferring the schema of a CSV string with missing values
>>> from pyspark.sql import functions as sf >>> df = spark.range(1) >>> df.select(sf.schema_of_csv(sf.lit('1||true'), {'sep':'|'})).show(truncate=False) +-------------------------------------------+ |schema_of_csv(1||true) | +-------------------------------------------+ |STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>| +-------------------------------------------+
Example 3: Inferring the schema of a CSV string with a different delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.range(1) >>> df.select(sf.schema_of_csv(sf.lit('1;a;true'), {'sep':';'})).show(truncate=False) +-------------------------------------------+ |schema_of_csv(1;a;true) | +-------------------------------------------+ |STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>| +-------------------------------------------+
Example 4: Inferring the schema of a CSV string with quoted fields
>>> from pyspark.sql import functions as sf >>> df = spark.range(1) >>> df.select(sf.schema_of_csv(sf.lit('"1","a","true"'), {'sep':','})).show(truncate=False) +-------------------------------------------+ |schema_of_csv("1","a","true") | +-------------------------------------------+ |STRUCT<_c0: INT, _c1: STRING, _c2: BOOLEAN>| +-------------------------------------------+