Skip to content

Best Moment Ever

Best Moment Ever

For the following task, put your solution in a separate script exercise_best_moment.py. Import the already created functions as needed. Use the same data as in the episodes.

We are looking for hours with really nice weather! Here are the conditions they should fulfill:

Measurement Desired values (x) Comment
Air Temperature 15°C ≤ x ≤ 25°C Comfortable temperature
Precipitation (1h) x = 0mm No rain
Wind speeds 0 m/s < x ≤ 2 m/s Slight breeze
Sky Condition x in {2, 3, 4} Few clouds
Time of day 08:00 ≤ x ≤ 18:00 Daytime

1. The Data Part

Find all moments that fulfill the above conditions. Print the timestamps (i.e. date and hour, as in the index) of these moments.

Hints
  • Consider building up the filter masks one by one. Combine them later when they all individually work.
  • For the sky condition, check out the .isin(…)-method that creates filter masks based on a sequence
  • The time of day is part of the index. Fortunately, pandas has the .between_time(…) filter. Note that this filter applies directly and does not produce a filter mask first.
Expected Result
Output of the Data Part
| 2020-04-25 17:00:00 | 2020-08-05 13:00:00 |
| 2020-06-21 14:00:00 | 2020-09-03 11:00:00 |
| 2020-06-26 09:00:00 | 2020-09-06 12:00:00 |
| 2020-06-29 08:00:00 | 2020-09-07 15:00:00 |
| 2020-07-01 18:00:00 | 2020-09-27 17:00:00 |
| 2020-07-05 14:00:00 | 2020-09-29 09:00:00 |
| 2020-07-08 12:00:00 | 2020-09-30 17:00:00 |
| 2020-07-11 08:00:00 | 2020-10-02 17:00:00 |
| 2020-07-23 13:00:00 | 2020-10-05 17:00:00 |
| 2020-08-05 11:00:00 |

There should be 19 entries in the final result. Pretty formatting the print output is optional, any formatting will do.

2. The Plotting Part

For the following task, create a new module timeline.py.

Since creating timelines that work with any sequence of dates can become rather tricky, we will break this down into steps.

The Foundation

To get started, we want to solve plotting some dots along a line where the dates indicate an event happening. We do not care about any axis or decoration (yet).

Define a function _plot_timeline(dates, y_offset=0). The function name starts with an _ to signify that it is a helper that normally should not be used on its own.

Parameter Name Type Comment
dates DatetimeIndex For each of the timestamps a dot should appear on the timeline.
y_offset int The position on the y-axis that the timeline should be plotted at.

The y-offset becomes important when we later want to plot multiple timelines at once, so they don’t get plotted on top of each other. If we only want to plot one timeline, using a default value of 0 should be fine.

Inside the function, the following happens:

  1. An axhline(…) is drawn at the y_offset. Suggested parameters for a nice look are
    • linewidth=10 for a thick line
    • color="lightgray" so it does not take away the attention from the markers
    • alpha=0.75 to add transparency and blend it better with the background
  2. plot(…) the dates.
    • Each of them has a y-position according to the y_offset.
    • Use marker="|" and markersize=10 to get a nice bar-shaped marker at every position.
    • Further use linewidth=0 to make the lines connecting the individual data points invisible.
How it could look now

Suppose your final data frame in part 1 was called result.

>>> from matplotlib import pyplot
>>> from timeline import _plot_timeline
>>> _plot_timeline(result.index)
>>> pyplot.show()

Should produce something like this:

Building on top

We now want to build on top of what we already have. Usually we might want to plot multiple timelines to compare them.

Let’s define a second function in timeline.py which we call plot_timelines_at_year(…). This is the function people are actually supposed to use to plot a timeline.

The function needs two parameters:

Parameter Name Type Comment
year int The year for which to draw the timeline.
labelled_dataframes dict[str: DataFrame] (See below)

The idea of the labelled data frames is as follows:

We let the user specify data frames associated with a text label. These text labels are what we will plot onto the y-axis for the respective timeline.

Although we only need the index of the data frames for the timeline, we accept the whole data frame for the users’ convenience. We assume that the index is made up of timestamps (otherwise creating a timeline would make little sense).


Here are the steps you want to take within the function:

  1. Get the amount of entries in the labelled_dataframes dictionary and store it in a variable. You are going to need this quite often.
  2. Create a Timestamp for the first and last moment in the year that is to be processed.
    • The Timestamp class can be imported from pandas.
  3. Use pandas date_range(…)-function to generate a sequence of timestamps from the start of the year to the end.
    • You will later use them for the x-axis tick marks.
    • To get one timestamp at the beginning of each month, use the parameter freq="MS"
  4. Since the timeline is usually rather wide instead of high, start your plot by using pyplot.figure(…). Recommended values for the figsize-parameter would be
    • 12 in the x-direction
    • The amount of timelines to plot in the y-direction.
    • Note that figsize expects to be given a tuple of the shape
      (width along x-axis,height along y-axis)
  5. Plot all the timelines using the helper function from before.
    • Since our intention is to plot the timelines above each other, each of then uses a different y_offset.
    • If you combine a for loop with Pythons’ enumerate(…)-function and the dictionaries .items() you get a nested tuple of the shape (y_offset, (label, dataframe)) as a loop variable which can work nicely with our _plot_timeline(…)-function.
  6. During the loop also build up a list with the used y-offsets and a list with the labels that you can use later to set the y-axis tick marks.
  7. Use the collected lists from step 5 to set the y-axis ticks accordingly.
  8. Set the limits of the y-axis to -1 (bottom) and the amount of timelines (top) so we have some nice space around our timelines.
  9. Set the limit of the x-axis from the years’ start to the end.
  10. Set the ticks on the x-axis to the values defined in step 3. You can refine the tick labels by using the month_name()-method of the Timestamp to obtain the month name as a text.
  11. Enable the grid for the x-axis only.
  12. Use pyplot.tight_layout() to achieve a much nicer formatting of your plot.
How it should look now

Suppose your final data frame in part 1 was called result.

>>> from matplotlib import pyplot
>>> from timeline import plot_timelines_at_year
>>> plot_timelines_at_year(2020, {"Perfect": result})
>>> pyplot.show()

It should look something like this:

Plot them all!

Now that we have all the pieces in place it has become feasible to apply more filters and also add their timelines to compare.

Filter the weather data for the following conditions and add them to the timeline plot:

Label Condition
Perfect (Already did this)
Can’t be sure (See note below)
Too cold Air Temperature > 25°C
Too hot Air Temperature < 15°C
Rain Precipitation (1h) > 0mm

The Can’t be sure case is a bit special. If we investigate the columns for wind speed and sky condition a bit further, we notice that they have many missing values. This narrows down the amount of truly perfect moments we find since in many cases we are missing crucial information. To counter this we create a new filter rules to explore additional candidates:

Measurement Desired values (x) Additional rule
Air Temperature 15°C ≤ x ≤ 25°C
Precipitation (1h) x = 0mm
Wind speeds 0 m/s < x ≤ 2 m/s or undefined
Sky Condition x in {2, 3, 4} or undefined
Time of day 08:00 ≤ x ≤ 18:00

In all cases, limit the results to the time between 08:00 and 18:00.

How it may look

For comparison, here are the amount of rows of the resulting data frames that are plotted:

Label Rows
Perfect 19
Can’t be sure 237
Too cold 2272
Too hot 389
Rain 568

If you want a bigger challenge

Instead of taking the year as a parameter derive it from the provided indexes. Note that it then influences the resolution of your grid, so you will have to adapt the tick marks and labels dynamically. This could for example happen if you suddenly have a data that spans a whole decade or if various data sets cover timespans half a century apart.