pyspark.sql.functions.mask¶
-
pyspark.sql.functions.
mask
(col: ColumnOrName, upperChar: Optional[ColumnOrName] = None, lowerChar: Optional[ColumnOrName] = None, digitChar: Optional[ColumnOrName] = None, otherChar: Optional[ColumnOrName] = None) → pyspark.sql.column.Column[source]¶ Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.
New in version 3.5.0.
- Parameters
- col:class:~pyspark.sql.Column or str
target column to compute on.
- upperChar:class:~pyspark.sql.Column or str
character to replace upper-case characters with. Specify NULL to retain original character.
- lowerChar:class:~pyspark.sql.Column or str
character to replace lower-case characters with. Specify NULL to retain original character.
- digitChar:class:~pyspark.sql.Column or str
character to replace digit characters with. Specify NULL to retain original character.
- otherChar:class:~pyspark.sql.Column or str
character to replace all other characters with. Specify NULL to retain original character.
- Returns
Examples
>>> df = spark.createDataFrame([("AbCD123-@$#",), ("abcd-EFGH-8765-4321",)], ['data']) >>> df.select(mask(df.data).alias('r')).collect() [Row(r='XxXXnnn-@$#'), Row(r='xxxx-XXXX-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y')).alias('r')).collect() [Row(r='YxYYnnn-@$#'), Row(r='xxxx-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y')).alias('r')).collect() [Row(r='YyYYnnn-@$#'), Row(r='yyyy-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d')).alias('r')).collect() [Row(r='YyYYddd-@$#'), Row(r='yyyy-YYYY-dddd-dddd')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d'), lit('*')).alias('r')).collect() [Row(r='YyYYddd****'), Row(r='yyyy*YYYY*dddd*dddd')]