pyspark.sql.GroupedData#

class pyspark.sql.GroupedData(jgd, df)[source]#

A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Methods

`agg`(*exprs)	Compute aggregates and returns the result as a `DataFrame`.
`apply`(udf)	It is an alias of `pyspark.sql.GroupedData.applyInPandas()`; however, it takes a `pyspark.sql.functions.pandas_udf()` whereas `pyspark.sql.GroupedData.applyInPandas()` takes a Python native function.
`applyInArrow`(func, schema)	Maps each group of the current `DataFrame` using an Arrow udf and returns the result as a DataFrame.
`applyInPandas`(func, schema)	Maps each group of the current `DataFrame` using a pandas udf and returns the result as a DataFrame.
`applyInPandasWithState`(func, ...)	Applies the given function to each group of data, while maintaining a user-defined per-group state.
`avg`(*cols)	Computes average values for each numeric columns for each group.
`cogroup`(other)	Cogroups this group with another group so that we can run cogrouped operations.
`count`()	Counts the number of records for each group.
`max`(*cols)	Computes the max value for each numeric columns for each group.
`mean`(*cols)	Computes average values for each numeric columns for each group.
`min`(*cols)	Computes the min value for each numeric column for each group.
`pivot`(pivot_col[, values])	Pivots a column of the current `DataFrame` and performs the specified aggregation.
`sum`(*cols)	Computes the sum for each numeric columns for each group.