Scraping and Mapping SMU Football Travel 2024
Install Dependencies
pip install selenium
Requirement already satisfied: selenium in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (4.20.0)
Requirement already satisfied: urllib3[socks]<3,>=1.26 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from selenium) (1.26.11)
Requirement already satisfied: trio~=0.17 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from selenium) (0.25.0)
Requirement already satisfied: certifi>=2021.10.8 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from selenium) (2022.9.24)
Requirement already satisfied: trio-websocket~=0.9 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from selenium) (0.11.1)
Requirement already satisfied: typing_extensions>=4.9.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from selenium) (4.11.0)
Requirement already satisfied: attrs>=23.2.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (23.2.0)
Requirement already satisfied: idna in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (3.3)
Requirement already satisfied: sniffio>=1.3.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.3.1)
Requirement already satisfied: sortedcontainers in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (2.4.0)
Requirement already satisfied: outcome in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.3.0.post0)
Requirement already satisfied: exceptiongroup in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.2.1)
Requirement already satisfied: wsproto>=0.14 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from trio-websocket~=0.9->selenium) (1.2.0)
Requirement already satisfied: PySocks!=1.5.7,<2.0,>=1.5.6 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from urllib3[socks]<3,>=1.26->selenium) (1.7.1)
Requirement already satisfied: h11<1,>=0.9.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.14.0)
Note: you may need to restart the kernel to use updated packages.
pip install folium
Requirement already satisfied: folium in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (0.16.0)
Requirement already satisfied: numpy in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from folium) (1.26.4)
Requirement already satisfied: branca>=0.6.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from folium) (0.7.2)
Requirement already satisfied: jinja2>=2.9 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from folium) (3.1.4)
Requirement already satisfied: requests in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from folium) (2.28.1)
Requirement already satisfied: xyzservices in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from folium) (2024.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from jinja2>=2.9->folium) (2.0.1)
Requirement already satisfied: charset-normalizer<3,>=2 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from requests->folium) (1.26.11)
Requirement already satisfied: certifi>=2017.4.17 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from requests->folium) (2022.9.24)
Requirement already satisfied: idna<4,>=2.5 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from requests->folium) (3.3)
Note: you may need to restart the kernel to use updated packages.
pip install geopy
Requirement already satisfied: geopy in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (2.4.1)
Requirement already satisfied: geographiclib<3,>=1.52 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from geopy) (2.0)
Note: you may need to restart the kernel to use updated packages.
pip install pandas
Requirement already satisfied: pandas in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (2.2.0)
Requirement already satisfied: numpy<2,>=1.22.4 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.26.4)
Requirement already satisfied: tzdata>=2022.7 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2023.2)
Requirement already satisfied: pytz>=2020.1 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import os
import folium
from geopy.geocoders import Nominatim
/Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/expressions.py:21: UserWarning: Pandas requires version '2.8.4' or newer of 'numexpr' (version '2.8.3' currently installed).
from pandas.core.computation.check import NUMEXPR_INSTALLED
/Users/palmerjones/opt/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
from pandas.core import (
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import re
from datetime import datetime
import random
import csv
Define Global Variables
# URL of the website to scrape
url = 'https://fbschedules.com/2024-smu-football-schedule/'
# The csv file that we're saving all the games to, and reading from
file_name = "smuFootballSchedule2024.csv"
Scrape the website
# Initialize Safari WebDriver
driver = webdriver.Safari()
driver.implicitly_wait(1) #wait up to 5 secs for the page to load
from selenium.common.exceptions import NoSuchElementException, TimeoutException
# Open the URL in the browser
driver.get(url)
# define Game class
class Game:
def __init__(self, game_card):
self.game_card = game_card
#self.raw_text = raw_text
# define games list of objects
games = []
# Find all div elements by xpath
game_cards = driver.find_elements(By.XPATH, '//*[@id="main"]/div[1]/div/div/div/div[6]/div/div[2]/table/tbody/tr')
# Save the html
for game_card in game_cards:
# create the object for this game card
game = Game(game_card)
# Get the date
try:
#print(game_card.find_element(By.CLASS_NAME, 'cfb2').find_element(By.TAG_NAME, 'a').text)
date = game.game_card.find_element(By.CLASS_NAME, 'cfb1').text
date = ' '.join(date.split())
except (NoSuchElementException, TimeoutException):
date = ""
game.date = date
# Get the opponent
try:
opponent = game.game_card.find_element(By.CLASS_NAME, 'cfb2').find_element(By.TAG_NAME, 'a').text
except (NoSuchElementException, TimeoutException):
try:
opponent = game.game_card.find_element(By.CLASS_NAME, 'cfb2').find_element(By.TAG_NAME, 'strong').text
except (NoSuchElementException, TimeoutException):
opponent = "START"
if opponent.startswith(" "):
opponent = opponent[4:]
game.opponent = opponent
# Get the location
try:
location = game.game_card.find_element(By.CLASS_NAME, 'cfb2').find_element(By.CLASS_NAME, 'stadium-txt-span').text
except (NoSuchElementException, TimeoutException):
try:
location = game.game_card.find_element(By.CLASS_NAME, 'cfb2').text.replace(game.game_card.find_element(By.CLASS_NAME, 'cfb2').find_element(By.TAG_NAME, 'strong').text, '').strip()
except (Exception, NoSuchElementException, TimeoutException):
location = ""
game.location = location
# Get the ticketLink
try:
ticketLink = game.game_card.find_element(By.CLASS_NAME, 'cfb4').find_element(By.TAG_NAME, 'a').get_attribute('href')
except (NoSuchElementException, TimeoutException):
ticketLink = ""
game.ticketLink = ticketLink
# Set up latitude and longitude
game.latitude = None
game.longitude = None
# Test Print
print(game.location)
# append to the list of objects
games.append(game)
# Close the browser
driver.quit()
Mackay Stadium, Reno, NV
Gerald J. Ford Stadium, Dallas, TX
Gerald J. Ford Stadium, Dallas, TX
Gerald J. Ford Stadium, Dallas, TX
Gerald J. Ford Stadium, Dallas, TX
L&N Stadium, Louisville, KY
Stanford Stadium, Stanford, CA
Wallace Wade Stadium, Durham, NC
Gerald J. Ford Stadium, Dallas, TX
Gerald J. Ford Stadium, Dallas, TX
Scott Stadium, Charlottesville, VA
Gerald J. Ford Stadium, Dallas, TX
Bank of America Stadium, Charlotte, NC
Get Most Common Location for OFF Weeks
# Get default location (one that occurs most often)
from collections import Counter
# Extract the locations from the list of game objects, excluding empty strings
locations = [game.location for game in games if game.location]
# Count the frequency of each location
location_counts = Counter(locations)
# Find the location with the maximum frequency
most_common_location = location_counts.most_common(1)[0]
print(f"The most common location is '{most_common_location[0]}' with {most_common_location[1]} occurrences.")
The most common location is 'Gerald J. Ford Stadium, Dallas, TX' with 7 occurrences.
Locate the Opponents on Map
geolocator = Nominatim(user_agent="palmercjones@comcast.net")
for game in games:
if (game.latitude == None) or (game.longitude == None):
if (game.location == ""):
game.location = most_common_location[0]
coordinates_query = f"{game.location}"
coordinates = geolocator.geocode(coordinates_query)
if coordinates == None:
coordinates_query = f"{game.location.split(', ', 1)[1] if ', ' in game.location else game.location}"
coordinates = geolocator.geocode(coordinates_query)
print(coordinates_query)
#print(coordinates)
newLatitude = coordinates.latitude
newLongitude = coordinates.longitude
while (any(game.latitude == newLatitude and game.longitude == newLongitude for game in games)):
newLatitude = newLatitude - 0.01
newLongitude = newLongitude + 0.01
game.latitude = newLatitude
game.longitude = newLongitude
if ('Champion' in game.opponent): #championship
game.color = "darkpurple"
game.icon = "trophy"
elif ('START' in game.opponent): #start
game.color = "green"
game.icon = "play"
elif (game.opponent == "OFF"): #OFF
game.color = "lightgray"
game.icon = "moon"
elif (game.location == most_common_location[0]): #home
game.color = "darkred"
game.icon = "home"
else: #away
game.color = "darkblue"
game.icon = "plane"
#print(game.color)
print(f"Coordinates for {game.opponent}, Latitude = {game.latitude}, Longitude = {game.longitude}")
Coordinates for START, Latitude = 32.838058700000005, Longitude = -96.78345776612298
Coordinates for Nevada Wolf Pack , Latitude = 39.5468915, Longitude = -119.81739798009238
Coordinates for HCU Huskies , Latitude = 32.82805870000001, Longitude = -96.77345776612297
Coordinates for BYU Cougars , Latitude = 32.81805870000001, Longitude = -96.76345776612297
Coordinates for OFF, Latitude = 32.80805870000001, Longitude = -96.75345776612296
Coordinates for TCU Horned Frogs , Latitude = 32.79805870000001, Longitude = -96.74345776612296
Coordinates for Florida State Seminoles , Latitude = 32.788058700000015, Longitude = -96.73345776612295
Coordinates for Louisville Cardinals , Latitude = 38.206016149999996, Longitude = -85.75877338041424
Coordinates for OFF, Latitude = 32.77805870000002, Longitude = -96.72345776612295
Coordinates for Stanford Cardinal , Latitude = 37.43453005, Longitude = -122.16116296732366
Coordinates for Duke Blue Devils , Latitude = 35.995445849999996, Longitude = -78.94188924076411
Coordinates for Pitt Panthers , Latitude = 32.76805870000002, Longitude = -96.71345776612294
Coordinates for OFF, Latitude = 32.75805870000002, Longitude = -96.70345776612294
Coordinates for Boston College Eagles , Latitude = 32.74805870000002, Longitude = -96.69345776612293
Coordinates for Virginia Cavaliers , Latitude = 38.029306, Longitude = -78.4766781
Coordinates for California Golden Bears , Latitude = 32.738058700000025, Longitude = -96.68345776612293
Coordinates for ACC Championship, Latitude = 35.22579505, Longitude = -80.85385877910787
Open Existing File
# Initialize an empty list to store the data
csv_data = []
# Check if the file exists
if os.path.exists(file_name):
# Open the file in read mode
with open(file_name, 'r') as csv_file:
# Create a CSV reader object
csv_reader = csv.reader(csv_file)
# Read each row from the CSV file and append it to the data list
for row in csv_reader:
csv_data.append(row)
print("Data imported successfully:")
#for row in csv_data:
# print(row)
else:
print(f"The file {file_name} does not exist.")
Data imported successfully:
Make the Map
# Get the directory where the Python script or notebook is located
current_dir = os.path.dirname(os.path.abspath('smuFootballMap2024.ipynb'))
# Set the current working directory to the directory of the Python script or notebook
os.chdir(current_dir)
# Create a map centered at the geographical center of the US
m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
# Add markers for each location
for game in games:
folium.Marker(
location=[game.latitude, game.longitude],
tooltip=(game.opponent + '<br>' +game.location + '<br>' + game.date),
icon=folium.Icon(color=game.color, prefix='fa',icon=game.icon)
).add_to(m)
travel_coordinates = [(game.latitude, game.longitude) for game in games if game.location]
#print(coordinates)
folium.PolyLine(travel_coordinates, tooltip="Travel").add_to(m)
# Specify the path to save the HTML file
html_file_path = os.path.join(current_dir, 'map_smuFootball2024.html')
# Save the map to an HTML file in the current directory
m.save(html_file_path)
print(f"Map saved to: {html_file_path}")
m
Map saved to: /Users/palmerjones/Downloads/map_smuFootball2024.html
Make this Notebook Trusted to load map: File -> Trust Notebook
Save the CSV
# Save locations to csv
# Open the file in write mode with newline='' to prevent extra blank lines
with open(file_name, 'w', newline='') as csv_file:
# Create a CSV writer object
csv_writer = csv.writer(csv_file)
# Write the header row
#csv_writer.writerow(["Name", "Age", "City"])
# Write the data rows
for game in games:
csv_writer.writerow([game.opponent, game.date, game.location, game.ticketLink])
print(f"Data saved to {file_name}")
Data saved to smuFootballSchedule2024.csv