Task 2: Loading the Data¶
To load the data you can use the pandas.read_csv()
function. (read_csv()
documentation)
Hints¶
- In these data sets the seperator for the data fields is not a comma, but multiple whitespaces.
You can use the regular expression
"\s+"
to express this in python. - Note the parameter
parse_dates
of theread_csv()
-function which can come in extremely handy. - Note that the data set as provided has no header.
As noted previously, the downloaded data is compressed in a gz
-archive.
You could decompress it before working with it
(especially useful if you want to inspect the data beforehand with a plain text editor or other tool/programs),
the read_csv()
-function itself however can handle a such an archive just fine.
Tasks¶
- Consider first what the loaded data should look like
- Load the data set using the
read_csv()
-function from pandas. combine the year, month, day and hour columns into one single column for the timestamp. - Set the timestamp to be the index of your dataframe
- Display the loaded data, compare the result with your expectations
- Do a plausability check:
- Check the number of rows and columns
- Check if the data inside the rows is displayed correctly (i.e. no columns got joined or torn apart), especially the date column
- Assign a proper header based on the information from the data documentation
Hints for Solving the Task
If you are seriously stuck, you can take a look at the solution hints.