Pandas DataFrame drop_duplicates() Method

Example

Remove duplicate rows from the DataFrame:

import pandas as pd

data = {
"name": ["Sally", "Mary", "John", "Mary"],
"age": [50, 40, 30, 40],
"qualified": [True, False, False, False]
}

df = pd.DataFrame(data)

newdf = df.drop_duplicates()

Try it Yourself »

Definition and Usage

The drop_duplicates() method removes duplicate rows.

Use the subset parameter to specify if any columns should not be considered when looking for duplicates.

Syntax

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Parameters

The parameters are keyword arguments.

Parameter	Value	Description
subset	column label(s)	Optional. A String, or a list, containing any columns to ignore
keep	`'first' 'last' False`	Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates
inplace	`True False`	Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done.
ignore_index	`True False`	Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not

Return Value

A DataFrame with the result, or None if the inplace parameter is set to True.

❮ DataFrame Reference

Pandas Tutorial

Cleaning Data

Correlations

Plotting

References

Pandas DataFrame drop_duplicates() Method

Example

Definition and Usage

Syntax

Parameters

Return Value

COLOR PICKER

Follow US

Top Tutorials

Top References

Top Examples