pyspark.pandas.DataFrame.mode#

DataFrame.mode(axis=0, numeric_only=False, dropna=True)[source]#

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

New in version 3.4.0.

Parameters

axis{0 or ‘index’}, default 0: Axis for the function to be applied on.
numeric_onlybool, default False: If True, only apply to numeric columns.
dropnabool, default True: Don’t consider counts of NaN/NaT.

Returns

DataFrame: The modes of each column or row.

See also

Series.mode: Return the highest frequency value in a Series.
Series.value_counts: Return the counts of values in a Series.

Examples

>>> df = ps.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1    None   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0