Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices

This vignette illustrates how to estimate bid-ask spreads from open, high, low, and close prices using the efficient estimator described in Ardia, Guidotti, & Kroencke (JFE, 2024): https://doi.org/10.1016/j.jfineco.2024.103916.

library(bidask)

The function edge computes a single bid-ask spread estimate from vectors of open, high, low, and close prices. The functions edge_rolling and edge_expanding are optimized for fast calculations over rolling and expanding windows, respectively. The function spread provides additional functionalities for xts objects and implements additional estimators. For all functions, an output value of 0.01 corresponds to a spread estimate of 1%.

Functions edge, edge_rolling, edge_expanding

These functions can be easily used with tidy data. For instance, download daily prices for Bitcoin and Ethereum using the crypto2 package:

library(dplyr)
library(crypto2)
df <- crypto_list(only_active=TRUE) %>%
  filter(symbol %in% c("BTC", "ETH")) %>%
  crypto_history(start_date = "20200101", end_date = "20221231")
head(df)
#> # A tibble: 6 × 17
#>      id slug    name    symbol timestamp           ref_cur_id ref_cur_name
#>   <int> <chr>   <chr>   <chr>  <dttm>              <chr>      <chr>       
#> 1     1 bitcoin Bitcoin BTC    2020-01-01 23:59:59 2781       USD         
#> 2     1 bitcoin Bitcoin BTC    2020-01-02 23:59:59 2781       USD         
#> 3     1 bitcoin Bitcoin BTC    2020-01-03 23:59:59 2781       USD         
#> 4     1 bitcoin Bitcoin BTC    2020-01-04 23:59:59 2781       USD         
#> 5     1 bitcoin Bitcoin BTC    2020-01-05 23:59:59 2781       USD         
#> 6     1 bitcoin Bitcoin BTC    2020-01-06 23:59:59 2781       USD         
#> # ℹ 10 more variables: time_open <dttm>, time_close <dttm>, time_high <dttm>,
#> #   time_low <dttm>, open <dbl>, high <dbl>, low <dbl>, close <dbl>,
#> #   volume <dbl>, market_cap <dbl>

Estimate the spread for each coin in each year:

df %>%
  mutate(yyyy = format(timestamp, "%Y")) %>%
  group_by(symbol, yyyy) %>%
  arrange(timestamp) %>%
  summarise("EDGE" = edge(open, high, low, close))
#> # A tibble: 6 × 3
#> # Groups:   symbol [2]
#>   symbol yyyy      EDGE
#>   <chr>  <chr>    <dbl>
#> 1 BTC    2020  0.00319 
#> 2 BTC    2021  0.00376 
#> 3 BTC    2022  0.000200
#> 4 ETH    2020  0.00223 
#> 5 ETH    2021  0.00628 
#> 6 ETH    2022  0.00262

Estimate the spread using a rolling window of 30 days for each coin and plot the results:

library(ggplot2)
df %>%
  group_by(symbol) %>%
  arrange(timestamp) %>%
  mutate("EDGE (rolling)" = edge_rolling(open, high, low, close, width = 30)) %>%
  ggplot(aes(x = timestamp, y = `EDGE (rolling)`, color = symbol)) +
  geom_line() +
  theme_minimal()

Estimate the spread using an expanding window for each coin and plot the results:

df %>%
  group_by(symbol) %>%
  arrange(timestamp) %>%
  mutate("EDGE (expanding)" = edge_expanding(open, high, low, close)) %>%
  ggplot(aes(x = timestamp, y = `EDGE (expanding)`, color = symbol)) +
  geom_line() +
  theme_minimal()

Notice that, generally, using intraday data (instead of daily) improves the estimation accuracy, especially when the spread is expected to be small (see example below).

Function spread

The function spread() provides additional functionalities for xts objects and implements additional estimators. For instance, download daily data for Microsoft (MSFT) using the quantmod package which returns an xts object:

library(quantmod)
x <- getSymbols("MSFT", auto.assign = FALSE, start = "2019-01-01", end = "2022-12-31")
head(x)
#>            MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
#> 2007-01-03     29.91     30.25    29.40      29.86    76935100      21.23530
#> 2007-01-04     29.70     29.97    29.44      29.81    45774500      21.19974
#> 2007-01-05     29.63     29.75    29.45      29.64    44607200      21.07885
#> 2007-01-08     29.65     30.10    29.53      29.93    50220200      21.28508
#> 2007-01-09     30.00     30.18    29.73      29.96    44636600      21.30642
#> 2007-01-10     29.80     29.89    29.43      29.66    55017400      21.09307
class(x)
#> [1] "xts" "zoo"

Estimate the spread with:

spread(x)
#>                   EDGE
#> 2025-02-25 0.005399479

or, equivalently:

edge(open = x[,1], high = x[,2], low = x[,3], close = x[,4])
#> [1] 0.005399479

Estimate the spread for each month and plot the estimates:

sp <- spread(x, width = endpoints(x, on = "months"))
plot(sp)

Estimate the spread using a rolling window of 21 obervations:

sp <- spread(x, width = 21)
plot(sp)

To illustrate higher-frequency estimates, download intraday data from Alpha Vantage. You must register with Alpha Vantage in order to download their data, but the one-time registration is fast and free. Register at https://www.alphavantage.co/ to receive your key. You can set the API key globally as follows:

setDefaults(getSymbols.av, api.key = "<API-KEY>")

Download minute data for Microsoft:

x <- getSymbols(
  Symbols = "MSFT", 
  auto.assign = FALSE, 
  src = "av", 
  periodicity = "intraday", 
  interval = "1min", 
  output.size = "full")

Keep only prices during regular market hours:

x <- x["T09:30/T16:00"]
head(x)
#>                     MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
#> 2023-08-17 09:30:00   320.540   321.870  320.405     321.75      364230
#> 2023-08-17 09:31:00   321.780   321.781  320.890     321.04       66948
#> 2023-08-17 09:32:00   321.080   321.330  320.805     321.16       61487
#> 2023-08-17 09:33:00   321.220   321.220  320.450     320.63       51775
#> 2023-08-17 09:34:00   320.625   320.920  320.480     320.60       57119
#> 2023-08-17 09:35:00   320.570   320.860  320.455     320.71       90454

Estimate the spread for each day and plot the estimates:

sp <- spread(x, width = endpoints(x, on = "day"))
plot(sp, type = "b")

Use multiple estimators and plot the estimates:

sp <- spread(x, width = endpoints(x, on = "day"), method = c("EDGE", "AR", "CS", "ROLL"))
plot(sp, type = "b", legend.loc = "topright")

GitHub

If you find this package useful, please star the repo! The repository also contains implementations for Python, C++, MATLAB, and more; as well as open data containing bid-ask spread estimates for crypto pairs in Binance and for U.S. stocks in CRSP.

Cite as

Ardia, D., Guidotti, E., Kroencke, T.A. (2024). Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices. Journal of Financial Economics, 161, 103916. doi: 10.1016/j.jfineco.2024.103916

A BibTex entry for LaTeX users is:

@article{edge,
  title = {Efficient estimation of bid–ask spreads from open, high, low, and close prices},
  journal = {Journal of Financial Economics},
  volume = {161},
  pages = {103916},
  year = {2024},
  doi = {https://doi.org/10.1016/j.jfineco.2024.103916},
  author = {David Ardia and Emanuele Guidotti and Tim A. Kroencke},
}