-
Notifications
You must be signed in to change notification settings - Fork 1
Data Wrangling and Processing
Harsha Ramachandran edited this page Jul 14, 2022
·
2 revisions
We have the following data from the Transport for London API's
To start with we load the bike station data into a a pandas dataframe and add a demand column that index's a particular station. We also convert the id from a string in the form "stationid_x"to just x for easier processing later.
url="https://api.tfl.gov.uk/bikepoint"
bikeStations = pd.read_json(url)
bikeStations["demand"] = 0
def idGenerator(x):
return x.split("_")[1]
bikeStations.id = np.vectorize(idGenerator)(bikeStations.id)Next we load the data on trips. The important columns are the end station id, start station id and start and end dates. We convert the end and start dates to pandas datetime objects. Some of the end station id values are empty meaning the data was lost in some way. For these entries we set the id as -1
folders = [x[0] for x in os.walk(os.getcwd() + "/_TfL Cycling Data")]
folders.pop(0)
li = []
for path in folders:
all_files = glob.glob(path+ "/*.csv")
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
trips = pd.concat(li, axis=0, ignore_index=True)
## Bike Trip data Wrangling
trips[['EndStation Id']] = trips[['EndStation Id']].fillna(value=-1) # Endstations have some empty values, fill with -1
trips['EndStation Id'] = trips['EndStation Id'].astype(np.int64) # Cast as int as empty values default to float
trips[['End Date','Start Date']] = trips[['End Date','Start Date']].apply(lambda _: pd.to_datetime(_,format = "%d/%m/%Y %H:%M")) # Convert dates to datetime objects