pyspark.sql.GroupedData

class pyspark.sql.GroupedData(jgd: py4j.java_gateway.JavaObject, df: pyspark.sql.dataframe.DataFrame)[source]

A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().

New in version 1.3.

Methods

agg(*exprs)

Compute aggregates and returns the result as a DataFrame.

apply(udf)

It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.

applyInPandas(func, schema)

Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.

avg(*cols)

Computes average values for each numeric columns for each group.

cogroup(other)

Cogroups this group with another group so that we can run cogrouped operations.

count()

Counts the number of records for each group.

max(*cols)

Computes the max value for each numeric columns for each group.

mean(*cols)

Computes average values for each numeric columns for each group.

min(*cols)

Computes the min value for each numeric column for each group.

pivot(pivot_col[, values])

Pivots a column of the current DataFrame and perform the specified aggregation.

sum(*cols)

Computes the sum for each numeric columns for each group.