| Title: | Tidy Access to Women's Tennis Association (WTA) Data |
|---|---|
| Description: | Scrapes and tidies publicly available data from the Women's Tennis Association website (<https://www.wtatennis.com>). Provides helpers to retrieve player biographies, singles and doubles career overviews, match histories, live rankings and aggregate statistics. Dynamic pages are rendered through a headless 'Chrome' session so 'JavaScript'-generated content is fully captured, and all outputs are returned as tidy data frames suitable for downstream analysis or visualisation. |
| Authors: | Alejandro Navas González [aut, cre] (alias: Angnar) |
| Maintainer: | Alejandro Navas González <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 0.1.0 |
| Built: | 2026-05-16 09:11:17 UTC |
| Source: | https://github.com/angnar-97/matchpointr |
Parses the profile header of a WTA player page and returns a one-row tibble with name, nationality, birth date, birth place, height and handedness. The bulk of the data is read from the page's JSON-LD (schema.org Person) block, which is more stable than the visual markup; height is read from the profile bio block as a fallback.
wta_get_player_basics(player_url, download_images = TRUE)wta_get_player_basics(player_url, download_images = TRUE)
player_url |
Character. Full URL to a player page. Build it with
|
download_images |
Logical. When |
A one-row tibble::tibble() with columns:
player_idNumeric WTA id parsed from @id.
name, given_name, family_name
Name fields.
birth_dateDate of birth (ISO 8601 character).
nationality, birth_place, birth_country
Geography fields.
heightHeight string as shown on the bio (e.g. 5' 9" (1.74m)).
handednessDominant hand ("Right-Handed" / "Left-Handed").
nationality_code3-letter IOC/ISO code extracted from the flag
image (e.g. "CZE", "USA").
player_image_url, nationality_flag_url
Headshot and flag URLs.
player_imagemagick-image of the headshot, when
download_images = TRUE.
nationality_flagmagick-image of the flag SVG, when
download_images = TRUE and the suggested package rsvg is
installed (otherwise NA).
wta_get_player_basics(wta_player_url(320301, "katerina-siniakova"))wta_get_player_basics(wta_player_url(320301, "katerina-siniakova"))
Walks the dynamic "Matches" page of a player profile, clicking the "Show more" button until the full history is loaded, and returns one row per match with tournament, round, opponent, score and result.
wta_get_player_matches(player_url, max_clicks = 50L)wta_get_player_matches(player_url, max_clicks = 50L)
player_url |
Character. URL to the player page; the function
normalises to the |
max_clicks |
Integer. Safety cap for the "Show more" click loop. Defaults to 50. |
A tibble::tibble() with one row per match and columns:
tournament, tournament_date, round, opponent, opponent_seed,
opponent_country, opponent_rank, score, result.
url <- wta_player_url(320301, "katerina-siniakova", "matches") wta_get_player_matches(url)url <- wta_player_url(320301, "katerina-siniakova", "matches") wta_get_player_matches(url)
Returns the structured "additional properties" block from the page's JSON-LD: current singles and doubles rank, career titles, career prize money. Supplements with the career-high singles rank read from the bio side panel.
wta_get_player_overview(player_url)wta_get_player_overview(player_url)
player_url |
Character. URL to the player overview page. |
A long-format tibble::tibble() with columns metric and
value. Rows include singles_rank, doubles_rank,
singles_career_titles, doubles_career_titles,
career_prize_money, career_high.
wta_get_player_overview(wta_player_url(320301, "katerina-siniakova"))wta_get_player_overview(wta_player_url(320301, "katerina-siniakova"))
Scrapes the rankings table at
https://www.wtatennis.com/rankings/singles (or /doubles) and returns
a tidy tibble. The initial page renders the first 50 rows; increase the
browser dwell time with wait if the widget hasn't hydrated yet.
wta_get_rankings(type = c("singles", "doubles"), top = NULL, wait = 12)wta_get_rankings(type = c("singles", "doubles"), top = NULL, wait = 12)
type |
Character. One of |
top |
Integer. Limit the output to the top |
wait |
Numeric. Seconds to wait for the rankings widget to hydrate after navigation. Defaults to 12. |
A tibble::tibble() with one row per player and columns:
rank, player_id, player, country, age,
tournaments_played, points.
wta_get_rankings("singles", top = 50)wta_get_rankings("singles", top = 50)
Convenience wrapper to assemble a canonical player URL from a numeric id and an optional slug.
wta_player_url(id, slug = NULL, section = c("overview", "matches"))wta_player_url(id, slug = NULL, section = c("overview", "matches"))
id |
Character or integer. The WTA numeric player id (e.g. |
slug |
Optional character. Player slug (e.g. |
section |
Optional character. Page section to append as a path
segment, one of |
A single character string with the full URL.
wta_player_url(320301, "katerina-siniakova") wta_player_url(320301, "katerina-siniakova", "matches")wta_player_url(320301, "katerina-siniakova") wta_player_url(320301, "katerina-siniakova", "matches")