Skip to content

Task 05: Advanced Exploration

This is a suggested solution. It is meant to help you out if you struggle with a certain aspect of the exercise. Your own solution may differ widely and can still be perfectly valid.

Extrapolating Daily Statistics

Since we should do different things for different columns, it might be a good idea to introduce some shortcuts for them:

columns_mean = [LABEL_TEMP, LABEL_DEW, LABEL_PRES, LABEL_SPEED]
columns_sum = [LABEL_RAIN_1H, LABEL_RAIN_6H]
columns_any = [LABEL_RAIN_TRACE_1H, LABEL_RAIN_TRACE_6H]

To get a range for all the days in question we can take a similar approach as we did when filling the hours.

days = pandas.date_range(start=timestamp_start, end=timestamp_end, freq="D")

Note, how we can also re-use the timestamps from before. Now we have everything to set up an empty DataFrame for our daily statistics.

daily_weather = DataFrame(
    index=days,
    columns=columns_mean + columns_sum + columns_any
)

By using the groupby(…)-method of the DataFrame we can slice the weather data into chunks for each day.

grouped = weather_data.groupby(weather_data.index.floor("D"))
This gives us a grouped DataFrame wich we can query for individual groups (i.e. individual days in our case). Finally, we iterate over these individual days and assign the combined data from the respective group.

for day in days:
    # Fetch the data for the day to be processed
    current_day_data = grouped.get_group(day)  # (1)

    # Summarize the data accordingly and insert them into our new DataFrame
    # Our shortcuts from earlier come in very handy here
    daily_weather.loc[day, columns_mean] = current_day_data[columns_mean].mean()
    daily_weather.loc[day, columns_sum] = current_day_data[columns_sum].sum()
    daily_weather.loc[day, columns_any] = current_day_data[columns_any].any()

Notes:

  1. Using day to query for a group works because we used the floor(…)-method when grouping. In other approaches (e.g. using the date()-function instead) we would have to be aware the case that the data types of the group labels and day were different. Other approaches with similar effort are feasible, as long as they are consistent regarding the data type used for labelling/ querying the groups.