Section 2 Data Cleaning and Processing
This page covers Section II of the methodology covered in my report.
First, load all of the libraries required for this code:
2.1 Heart rate data
To read in all of the .TCX files, first create a list of all filenames in the ‘data’ directory with the appropriate extension.
# get all files in the dataset from the 'data' folder in our working directory
filenames <- list.files("data/", "*.tcx", full.names=TRUE)
This function will then perform the necessary processing for each .TCX file. We will read in each .TCX file in XML format, then convert it to a dataframe in R. We will make sure that each column is the appropriate data type and then convert the dataframe to a MoveStack object.
# input filename and outputs a movestack object
process <- function(data){
# read in the tcx data as xml
doc <- xmlParse(data)
# convert xml format to dataframe
df <- xmlToDataFrame(nodes = getNodeSet(doc, "//ns:Trackpoint", "ns"))
# convert from factors to characters
i <- sapply(df, is.factor) # find columns that are factors
df[i] <- lapply(df[i], as.character) # convert column to character
# convert heart rate from char to number
df$HeartRateBpm <- sapply(df$HeartRateBpm, as.numeric)
# separate the X and Y columns
for (i in 1:length(df$Position)){
a <- unlist(strsplit(df$Position[i], '-'))
df$Y_coord[i] <- (a[1])
df$X_coord[i] <- paste0('-',a[2])
}
# convert to numeric values
df$Y_coord <- sapply(df$Y_coord, as.numeric)
df$X_coord <- sapply(df$X_coord, as.numeric)
# convert time format
df$TimeConverted <- as.POSIXct(df$Time, tz="UTC", format="%Y-%m-%dT%H:%M:%S")
# remove duplicate timestamps
new <- df[!duplicated(df$TimeConverted), ]
# convert df to movement type
movement_format <- df2move(new,
proj = '+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0',
x = 'X_coord',
y = 'Y_coord',
time = 'TimeConverted',
data = new)
# return the move stack object
return(movement_format)
}
We will apply this function to all of our data files. Then we will add all of the data together to create a single object.
# process all of the data with the above function
list <- lapply(filenames, process)
# combine all of the data into one
# NOTE: brute force method - needs to be adapted if different number of files
test3 <- list[[1]] + list[[2]] + list[[3]] + list[[4]] + list[[5]] + list[[6]] + list[[7]] + list[[8]] + list[[9]] + list[[10]] + list[[11]]+ list[[12]]+list[[13]]+list[[14]]+list[[15]]
Next select the columns of interest from our data and write them to a .CSV file for us to take a look at manually.
# select columns
points_out <- as.data.frame(test3$TimeConverted)
points_out['X_coord'] <- test3$X_coord
points_out['Y_coord'] <- test3$Y_coord
points_out['hr']<- test3$HeartRateBpm
points_out['time']<- test3$TimeConverted
# write to CSV
write.csv(points_out, 'data/all_hr.csv')
Here is a subset of what the data now looks like:
## test3$TimeConverted X_coord Y_coord hr time
## 1 2019-12-05 23:11:59 -0.1273531 51.50747 130 2019-12-05 23:11:59
## 2 2019-12-05 23:12:02 -0.1273531 51.50747 129 2019-12-05 23:12:02
## 3 2019-12-05 23:12:03 -0.1273531 51.50747 129 2019-12-05 23:12:03
## 4 2019-12-05 23:12:04 -0.1273501 51.50747 129 2019-12-05 23:12:04
## 5 2019-12-05 23:12:05 -0.1273423 51.50747 127 2019-12-05 23:12:05
## 6 2019-12-05 23:12:06 -0.1273263 51.50747 127 2019-12-05 23:12:06
2.2 Creating a continuous heart rate layer
We will now process the cleaned heart rate points to create a continuous layer of anxiety levels. This processing is done using QGIS, and so a model with predefined steps and parameters has been provided: Link to model
The input for this model is a vector layer with points corresponding to each heart rate measurement. The QGIS ‘Points Layer from Table’ function can be used to read in the .CSV file generated in the previous step and create this vector layer.
The QGIS processing model performs the following functions:
- Interpolates the points to create a continuous surface of heart rate values
- Creates a buffer around all point locations.
- Clips the interpolated layer by the buffer to enclose the region of heart rate measurement.
- Smooths the interpolated layer.
The model outputs a raster layer of continuous heart rate values. This file should be saved as a GeoTIFF for future visualization.
2.3 Personal comments
We now need to read in the personal comments CSV file, stored in the same folder as our heart rate data.
Here is what this data looks like:
## ï..Note
## 1 Wow, very crowded sidewalk.
## 2 Got stuck in the middle of the intersection.
## 3 Tried to cross the street but I didn't see that car coming.
## 4 I think this is the area where people have reported getting harassed.
## 5 I always get stuck in the middle of the intersection on this street.
## 6 There are some bikes moving very quickly here
## Date_time
## 1 2019-12-11 17:22
## 2 2019-12-11 17:23
## 3 2019-12-11 17:32
## 4 2019-12-11 17:33
## 5 2019-12-13 13:52
## 6 2019-12-04 13:58
Our next task is to attach coordinates to each note, by joining through time with the heart rate data, as processed previously.
# convert time format to POSIXct
notes$TimeConverted <- as.POSIXct(notes$Date_time, tz='UTC')
# need to add coordinates by joining with heart rate data
# help from: https://stackoverflow.com/questions/39282749/r-how-to-join-two-data-frames-by-nearest-time-date
# convert to data tables
table_coords <- as.data.table(points_out)
table_notes <- as.data.table(notes)
# set join key
setkey(table_coords, time)
setkey(table_notes, TimeConverted)
# combine with nearest time stamp
combined <- table_coords[table_notes, roll = "nearest" ]
# output as a CSV to take a look at
write.csv(combined, 'data/comments_w_coords.csv')
Now we have our coordinates:
## test3$TimeConverted X_coord Y_coord hr time
## 1: 2019-12-04 13:58:00 -0.1263700 51.52482 130 2019-12-04 13:58:00
## 2: 2019-12-04 13:59:00 -0.1274354 51.52488 129 2019-12-04 13:59:00
## 3: 2019-12-04 14:04:53 -0.1323954 51.52330 124 2019-12-04 19:11:00
## 4: 2019-12-04 14:04:53 -0.1323954 51.52330 124 2019-12-04 19:15:00
## 5: 2019-12-05 11:31:00 -0.1249467 51.52490 140 2019-12-05 11:31:00
## 6: 2019-12-05 18:52:00 -0.1184381 51.52405 109 2019-12-05 18:52:00
## ï..Note
## 1: There are some bikes moving very quickly here
## 2: This park looks very peaceful
## 3: Still haven't gotten the hang of looking a different way when crossing the street
## 4: It's nice not to see any cars in here
## 5: Almost didn't see that car coming around the corner
## 6: The pedestrian scale of this street is really comfortable
## Date_time
## 1: 2019-12-04 13:58
## 2: 2019-12-04 13:59
## 3: 2019-12-04 19:11
## 4: 2019-12-04 19:15
## 5: 2019-12-05 11:31
## 6: 2019-12-05 18:52