Skip to content

Filtering

Filtering consists of two steps:

  1. Creating a filter mask to check which elements in a data frame meet a certain condition.
  2. Apply the filter mask to a data frame to get all the elements in the filtered positions.

Using Filter masks

We want to extract only the data for cold days, which we consider to be below 10 degrees Celsius. For this purpose we generate a series to use as a filter mask:

cold_days = measurements["Temperature"] < 10
print(cold_days)
Output
Monday       False
Tuesday       True
Wednesday     True
Thursday      True
Friday        True
Saturday     False
Sunday       False
Name: Temperature, dtype: bool

We can apply this filter to our dataframe:

print(measurements[cold_days])
Output
           Sneezes  Temperature  Humidity
Tuesday         41          8.2      76.3
Wednesday       56          7.6      82.4
Thursday        62          7.8      98.2
Friday          30          9.4      77.4

All in one

These steps often get combined into one:

print(measurements[measurements["Sneezes"] == 56])
Output
           Sneezes  Temperature  Humidity
Wednesday       56          7.6      82.4

A filter mask can be inverted by using the ~ prefix operator:

print(~cold_days)
Output
Monday        True
Tuesday      False
Wednesday    False
Thursday     False
Friday       False
Saturday      True
Sunday        True
Name: Temperature, dtype: bool

Key Points

  • Combining selections with a boolean comparison generates a filter mask which can then again be used to filter a dataframe