Modifying Dataframes¶
Incomplete Data¶
We intend to also note down the cleaning habits of our cat. For this purpose we have created a new series of measurements.
cleaning = Series(
data={"Monday": 2, "Friday": 1, "Saturday": 3},
index=days_of_week,
name="Cleaning"
)
print(cleaning)
Output
Note that not all weekdays have a value associated with it.
Incomplete data is a common problem in real-world measurements.
Pandas tends to represent “no data” as NaN
which can be a pitfall.
Calculating with DataFrames¶
Our vetinary friend wants to help us and requests we send them the temparatures we measured. Since they live in the US, they would prefer to have the measurements in Farenheidt:
Output
Adding another column to a Dataframe¶
To extend our dataframe, we can use
This seems not to have worked as we expected! The reason is that many dataframe manipulations return a copy with the result instead of manipulating the original dataframe. We can assign the result to our original dataframe (or a new variable)
Output
Side Note: Advanced filtering¶
Dataframes offer additional methods to generate filter masks.
Output
We can reduce this regarding rows or columns using the any()
-method:
print(missing_data.any(axis="columns"))
print() # Empty line as separator
print(missing_data.any(axis="index"))
Output
The axis
-parameter dictates that a summary of the columns is to be made, not along the columns (and similar for indexes).
These can be used to combine into handy filters like:
Output
Changing singular data¶
We now learn from our roommate, that out cat was observed cleaning itself only one time on Tuesday. Let’s update our table:
Output
Replacing Multiple values¶
The replace
-method of dataframes is a very powerful tool.
For example, we want to replace the NaN
values by more appropriate None
from math import nan # to get the constant that encodes NaN
measurements.replace(to_replace={nan: None}, inplace=True)
print(measurements)
Output
We set inplace=True
here to modify the dataframe directly.
Key Points
- Direct Calculations can be done on dataframes or series and apply the effect of the calculation to each cell
- Singular data can be changed by directly writing to the location