the money in scientific publishing

The Money in Scientific Publishing

9 May 2024

Introduction
How Science Works
The Article Processing Fee
Profit Margins
Lobbying
What Can Universities Do?
So What Do We Do Next?
FAQ

Introduction

Despite how hard I worked on it, I still don’t believe it’s worth $40 to read

Recently the last article of my PhD was published. The article ended up in a scientific journal called Nature Mental Health and is available online here. Now, unless you have a university affiliation that has a subscription you’ll have to pay $40 to download the thing. If you just want to you read it once you can rent it for 48 hours for a tenner. If you can’t or just don’t want to pay, it is available here for free with the same text, same images, same review process. So, why does the first one cost $40 and is the second one free? This is the rabbit’s hole that is scientific publishing, and we’re about to have some tea with a Mad Hatter.

This post orignates with a few conversations I’ve had with my colleagues at my industry job who have been lucky enough to be oblivious to the academic publishing industry. I struggled to explain the process to my colleagues, most of whom have a master’s degree but have not spent considerable time in academia. I’ll try to sum up how publishing academic articles works, where the money comes from and where it goes.

I’ll try to back up my arguments with data wherever I can and of course I’ll share the code I used. I hid the code by default but you can unfold the code chunks if you want to have a look. A final disclosure before the deep dive, my PhD research has mostly focused on psychiatry genetics and statistics, so I will approach this from the perspective of someone who published mostly in the medical/psychology field. The phenomenons and tendencies described here might be different for academics working in physics, ecology, political science, and philosophy.

Show Python setup

import pandas as pd
from openbb_terminal.sdk import openbb

Show R setup

library(tidyverse)
library(ggtext)
library(showtext)

font_add_google("Oswald", family = "custom")
showtext_auto()

How Science Works

Full list of acknowledgements, see the dedicated section in each of my articles, e.g. here

Now, my PhD was funded by a EU Commission grant (specifically ERA-NET Neuron SYNSCHIZ, grant no #283798) and supported by a few Research Council of Norway (RCN), or Norges forskningsråd (NFR) grants (most importantly #276082, #323961, #223273, and to a lesser degree #249795, #298646, and #300767 for those interested). Now, both the EU Commission and the Research Council of Norway are entirely publicly funded, this means that any output generated by the research projects that received funding from these agencies should belong to the tax-payers that made it possible as per both common sense and regulation. I believe this a completely sensible approach. If there is a collaboration between publicly funded research and the industry, this usually comes with some restrictions on how the company involved can market the eventual product (see for example the initial agreement on the Oxford/AstraZeneca COVID-19 vaccine and the study it references).

Open access (OA) is a common term used to describe articles that are freely available and not behind a paywall

So the main physical output of my PhD project are three peer-reviewed scientific articles which form the basis for my PhD thesis, and so these articles should be freely available to anyone wishing to read it. This is something I believe to be important, and since a few years the Research Council of Norway has also started mandating that all research funded by their grants should accessible to anyone (see their guidelines). As such, all researchers using these grants will need to ensure open access to their articles.

So the grant gives researchers money to fund their research (step 1 in the schematic below), the researchers perform their scientific experiments and hopefully discover new knowledge that may benefit the field of science they’re working in, so they write an article detailing their findings and subit it to a scientific journal (step 2). Following the proper scientific procedure, to ensure some quality control, before publication the journal editor ensures that each article peer reviewed. This means that the editor finds other scientists to read over the article and provide feedback on other experiments to include, methods to adapt, or text to rephrase etc. (step 3) before they give a thumbs-up that the article is ready for publication. These peer reviewers are always other scientists at other institutions that essentially never get compensated for the time they spend reviewing other people’s articles since it’s considered part of the job. The author of the manuscript revises the manuscript (step 4) and when the peer reviewers are satisfied, the article is ready to publish in the scientific journal (step 5). Anyone with a computer can host a PDF online, but I think basically everyone agrees that it’s preferable to collect articles from different research groups where they can be easily found by other scientists that is better than Google Scholar or Scopus.

schematic of the scientific publishing process

So publishers (such as Elsevier and Springer-Nature) enable the scientific process by facilitating peer review and publising articles in their journals and host the articles on their websites. At the moment, apart from a special few, all publishers and their journals are run on a for-profit basis and I’m yet to meet anyone that does not believe that doesn’t believe that facilitating the peer review process and hosting the servers for the articles should be fairly compensated, but is where it becomes a bit more complicated. Historically, journals used get their compensations by selling subscriptions to these journals to universities across the world. However, since the advent of the internet and the mandated open access of articles funded with public money, the publishers started demanding compensation up front for removing the pay-wall for these articles. We’ll get back to that.

In short, academics need to publish in big journals to increase their chances of getting funded in the future so they can continue their research

Each publisher maintains (usually) multiple journals (up to several hundreds in the case of Elsevier). Some of these journals are very widely read (such as Cell) and others are perhaps not as widely read (such as Partial Differential Equations in Applied Mathematics). For scientists, it’s beneficial to publish in journals that are widely read for two reasons. Firstly, this means a larger audience will get to hear about the important finding you have made, and secondly, the scientist can now use this publication in a high-profile journal to show in future grant applications: “Look, I am capable of doing important research that has a wide audience, please let me continue my research and fund my next project” (step 6 in the schematic above). Since grant agencies (including the Research Council of Norway) have been defunded in recent years, competition for grants has increased and applications from research groups that have not published their articles in big journals are (usually) less likely to receive funding (back to step 1).

At this point, many people with some knowledge of supply-and-demand forces will feel a tingle in their bones, because scientists are entirely dependent on publications in higher-profile journals for their livelihood. So spots in these journals are in high demand. Since a considerable subset of these scientists need to make their articles open access and the publisher cannot include these articles in the subscriptions they sell, the publishers found that they can demand higher-and-higher compensation for publication in these high-profile journals. This is the situation we’re at right now. The open access fees (or article processing fees, or APCs) have skyrocketed since publishers found that scientists were willing to pay considerable portions of their grant funding to ensure their articles could be published in high-impact journals so they would ensure continuity for their research and the work of their students and researchers under their supervision.

The Article Processing Fee (APC)

Okay, enough text, let’s start looking at some data. So we established that scientists are dependent on publishing in (high impact) journals for their own livelihood and that of their research group. And since these articles need to be open access, publishers cannot sell subscriptions with these articles and will demand a fee up front in compensation. In the past decade, publishers lifted along on this trend of increased competition in grant applications by demanding a higher cut from researchers to publish their articles in their bigger journals. Let’s see where that has led us.

So the big publishers will put a list of the open access fees on their websites.

Here are the lists for a few of the biggest publishers:

Elsevier, go to the APC Price List
Springer-Nature, for this post I used the list for 2023
Nature Portfolio which is part of Springer-Nature, but I guess they forgot to include a subset of their journals in the list above. Also, the APCs aren’t listed in a comprehensive list, but on each journal website individually. The website above says “APCs vary from title to title, starting from €2,290 in Scientific Reports to €10,290 in Nature”. As it turns out, only Scientific Reports has an APC of €2290, Nature Communications is €5690, and the other journals Nature Portfolio refers to as Transformative Journals are all €10290. Who knows why they forgot to include these journals in the Springer-Nature list. 🤷

The code chunk below reads the two Excel files and creates the data frame for the Nature Portfolio journals.

Show code

data_apcs_springer <- readxl::read_excel(
  "./data/springer_nature_2023_apcs.xlsx",
  skip = 2
) |>
  janitor::clean_names() |>
  rename(
    title = journal_title,
    issn = e_issn
  ) |>
  mutate(
    eur = parse_integer(eur_23),
    usd = parse_integer(usd_23),
    gbp = parse_integer(gbp_23),
  )

data_apcs_elsevier <- readxl::read_excel(
  "./data/elsevier_apcs.xlsx",
  skip = 2
) |>
  janitor::clean_names() |>
  drop_na(title) |>
  rename(
    usd = list_price,
    eur = x5,
    gbp = x6,
    jpy = x7
  ) |>
  slice(-1) |>
  mutate(
    across(usd:last_col(), ~ parse_integer(.x)),
    imprint = "Elsevier"
  )

data_apcs_natureportolio <- tibble(
  title = c(
    "Nature", "Nature Aging", "Nature Astronomy", "Nature Biomedical Engineering",
    "Nature Biotechnology", "Nature Cancer", "Nature Cardiovascular Research",
    "Nature Catalysis", "Nature Cell Biology", "Nature Chemical Biology",
    "Nature Chemistry", "Nature Climate Change", "Nature Computational Science",
    "Nature Ecology & Evolution", "Nature Electronics", "Nature Energy",
    "Nature Food", "Nature Genetics", "Nature Geoscience", "Nature Human Behaviour",
    "Nature Immunology", "Nature Machine Intelligence", "Nature Materials",
    "Nature Medicine", "Nature Mental Health", "Nature Metabolism", "Nature Methods",
    "Nature Microbiology", "Nature Nanotechnology", "Nature Neuroscience",
    "Nature Photonics", "Nature Physics", "Nature Plants",
    "Nature Structural & Molecular Biology", "Nature Sustainability",
    "Nature Synthesis and Nature Water"
  ),
  eur = 10290,
  usd = 12290,
  gbp = 8890,
  imprint = "Nature Portfolio"
)

data_apcs <- bind_rows(data_apcs_springer, data_apcs_elsevier, data_apcs_natureportolio)

Now let’s see the distribution of APCs across a few of the biggest distributors (Springer-Nature has a bunch of subsidiaries like BioMed Central and Nature Portfolio), so we’ll split it out across the biggest of these subsidiaries. In the plot I’ll highlight some of each publishers most well-known journals (at least in the biomedical field).

Show code for the plot

joi <- c(
  "Cell", "Nature", "Nature Genetics", "Nature Communications",
  "Scientific Reports", "Neuron", "Immunity", "BMC Cancer",
  "Journal of Cardiology", "NeuroImage", "The Lancet", "BMC Biology",
  "Reproductive Health", "Schizophrenia", "Journal of Big Data",
  "Molecular and Cellular Pediatrics"
)

data_joi <- data_apcs |>
  filter(title %in% joi)

data_apcs |>
  filter(str_detect(imprint, "Nature|^Springer$|Springer Nature|Elsevier|BioMed Central")) |>
  drop_na(usd) |>
  ggplot(aes(x = usd, y = imprint, fill = imprint)) +
  ggdist::geom_swarm(
    color = "transparent"
  ) +
  ggrepel::geom_text_repel(
    data = data_joi, aes(label = title),
    min.segment.length = 0,
    size = 3, color = "black", family = "custom"
  ) +
  labs(
    title = "Open access publication fees for<br>some of the biggest publishers",
    subtitle = "Fees range all the way from $200 to $12 290",
    caption = "**Source**: Elsevier, Springer-Nature publishing group"
  ) +
  scale_x_continuous(
    limits = c(0, NA),
    labels = scales::label_dollar(big.mark = " "),
    expand = expansion(add = c(0, 600))
  ) +
  ggthemes::scale_fill_tableau() +
  ggthemes::theme_fivethirtyeight(base_size = 16, base_family = "custom") +
  theme(
    legend.position = "none",
    plot.title = element_markdown(),
    plot.title.position = "plot",
    plot.caption = element_markdown(),
    plot.caption.position = "plot",
    panel.grid.major.y = element_blank(),
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white")
  )

The median APC across the journals in the dataset is $2900. For reference, the APC for the Transformative Journals from Nature Portfolio ($12290) is higher than the annual salary of an assistant professor in India (source). At this point I should mention that APCs are occasionally waived for researchers in low-income countries, but I’m not sure how often this actually happens if they have international collaborators.

It’s also important to reiterate that scientists don’t pay these APCs personally from their own bank accounts, usually a budget is reserved in a project’s funding plan to pay these fees. On occasion, the funding agency may have an agreement with the publishers where it will cover these fees directly for all of the projects the agency funds.

The 450 Movement has advocated for paying reviewers. Shockingly, publishers were not happy and have published critical reponses

So what’s this money for? It’s supposedly for the cost of facilitating the peer review process, editing, typesetting, publishing and hosting, and marketing. And I it’s expected that marketing and hosting cost more for bigger journals that have a wider audience than smaller ones with a niche audience. However, the biggest expenses are the typesetting and editing, both of which need to happen for both big and small journals. One might even argue that for bigger journals their cost per article might be lower given they can operate more efficiently. It’s also important to note that peer reviewers don’t get paid by the publishers for their work. It’s considered part of their academic duty. Physically printing articles happens very rarely anymore since the advent of the internet. Much of the editing is done by the authors themselves who follow a strict set of guidelines (like these from the Journal of Genetics and Genomics). And either way, I don’t think any accountant who would break down all the costs associated with peer reviewing and publishing a single scientific article (including a profit margin) could justify the cost without including some major premium for the fame of a journal’s name.

That leads us to the next question. Where does this money go?

Profit Margins

So the biggest scientific publishers are for-profit companies. Elsevier is a subsidiary of RELX PLC, a company that resulted from a merger between Reed International and Dutch publisher Elsevier. Currently, RELX PLC says it’s business is split into four business areas, Risk, Scientific, Technical & Medical (which includes scientific publishing), Legal, and Exhibitions. Since RELX is a publicly traded company, financial statements are available to the public through their website (e.g. through investor presentations) or an aggregator like Yahoo Finance. Ideally, these two websites show identical numbers, but I couldn’t reproduce the Adjusted operating profit number from the RELX presentations in Yahoo Finance, but got the same profit margin reported by both websites by using the Operating Income in the Yahoo Finance financials despite both reporting slightly different numbers and having the same Total Revenue numbers. According to the investor presentation about the year 2023, the adjusted operating profit for the “Scientific, Technical & Medical” division of RELX amounted to 1.165 billion GBP, which amounted to roughly 1.475 billion USD on the 31st of December 2023.

Yahoo Finance doesn’t break down the financials for RELX’ distinct business areas, but if the previous profit margin matches, we can use the same numbers to calculate the profit margins for RELX’ scientific publishing subsidiary (which I’ll simply refer to as Elsevier from now on). For this part of the analysis, I can use a module in Python called openbb, which is an open source emulator of the Bloomberg Terminal widely used in finance. It provides access to Yahoo Finance data through an API implementation. We’ll pull the financials for RELX and a few other big companies and other publishers to compare to.

Show code

def _clean_column_names(df):
  """
  Clean data frame column names
  """

  df.columns = df.columns.str.lower().str.replace("\'", "")
  df.columns = df.columns.str.lower().str.replace(" ", "_")
  df.columns = df.columns.str.lower().str.replace("-", "_")
  
  return df

def get_income_stats(tickers):
  """
  Get the income statements
  """

  df_collected = pd.DataFrame()
  for i in tickers:
    df_company = openbb.stocks.search(i).reset_index()
    company_cols = ["symbol", "name", "country"]
    df_company_small = df_company.loc[df_company["symbol"] == i, company_cols]
    
    df_inc_raw = openbb.stocks.fa.income(i)
    df_inc = _clean_column_names(df_inc_raw.transpose())
    df_inc = df_inc.reset_index().rename(columns={"index": "year"})
    df_inc["ticker"] = i
    df_inc["profit_margin"] = (df_inc["net_income"]) / df_inc["total_revenue"]
    fin_cols = ["ticker", "year", "total_revenue", "net_income", "gross_profit", "profit_margin"]
    df_inc_clean = df_inc.loc[df_inc["year"] != "ttm", fin_cols]
    df_inc_clean["year"] = pd.to_datetime(df_inc_clean["year"]).dt.year
    if df_inc_clean["year"].max() == pd.to_datetime("now").year:
      df_inc_clean["year"] = df_inc_clean["year"] - 1
    df_info = df_inc_clean.merge(df_company_small, left_on="ticker", right_on="symbol")
    df_collected = pd.concat([df_collected, df_info])

    df_out = df_collected.reset_index(drop=True)
    df_out = df_out[company_cols + fin_cols]

  return df_out

df_income = get_income_stats(["RELX", "PSO", "WLY", "AAPL", "META", "GOOGL", "AMZN", "KO", "IBM", "MSFT"])

df_income.to_csv("./data/income_data.csv", index=False)

The data in the table extracted above is a mix of the currencies of the origin countries of the companies (e.g. MSFT uses dollar, RELX uses GBP). Since we’ll only focus on profit margins (which are relative numbers) we can skip converting the currencies to a common one. Springer-Nature is also a for-profit company, but not publicly traded since it’s privately owned by the Holtzbrinck Publishing Group (also private) and BC Partners (a private equity firm). So raw data for Springer-Nature is not available in this section unfortunately, but they do report profit margin on their website, so we can manually add it from there too. Unfortunately, they haven’t released their statement for 2023 yet, so the latest data we have is from 2022. This showed that in 2022 Springer-Nature made a profit of 487.4 million euros, or about 522.5 million USD. We can also manually add the Elsevier subsidiary with data we got from the RELX website.

Show code

data_springer_nature <- tribble(
  ~year, ~total_revenue, ~net_income,
  2022, 1821.8, 487.4,
  2021, 1700.9, 443.4,
  2020, 1626.7, 396.2,
) |>
  mutate(
    profit_margin = net_income / total_revenue,
    symbol = "Springer-Nature",
    name = "Springer-Nature",
    name_clean = name
  )

data_elsevier <- tribble(
  ~year, ~total_revenue, ~net_income,
  2023, 3.062, 1.165,
  2022, 2.909, 1.100,
  2021, 2.649, 1.001,
  2020, 2.692, 1.021,
) |>
  mutate(
    profit_margin = net_income / total_revenue,
    symbol = "Elsevier",
    name = "RELX subsidiary",
    name_clean = name
  )

Let’s see how the profit margins compare for our selected companies (and one subsidiary).

Show code for the plot

data_income <- read_csv("./data/income_data.csv") |>
  mutate(
    name_clean = str_replace_all(name, "Corporation", "Corp."),
    name_clean = str_replace_all(name_clean, "John Wiley & Sons Inc. Common Stock", "John Wiley & Sons Inc."),
    name_clean = str_replace_all(name_clean, "International Business Machines", "IBM")
  ) |>
  bind_rows(data_springer_nature, data_elsevier)

pub_cols <- c(
  "RELX PLC" = "darkgreen",
  "RELX subsidiary" = "darkgreen",
  "Springer-Nature" = "#113249",
  "Pearson plc" = "#151515",
  "John Wiley & Sons Inc." = "pink"
)

data_income |>
  group_by(name_clean) |>
  slice_max(year) |>
  ungroup() |>
  mutate(
    company_label = str_glue("**{symbol}**<br>{name_clean}"),
    perc_label = str_glue(
      "{str_replace(symbol, 'Springer-Nature', '\n\nSpringer-\nNature\n(2022)')}\n{100 * round(profit_margin, 2)}%"
    ),
    company_label = fct_reorder(company_label, -profit_margin)
  ) |>
  ggplot(aes(x = company_label, y = profit_margin)) +
  geom_col(fill = "grey40") +
  geom_col(
    data = . %>% filter(str_detect(name_clean, "RELX|Springer|Pearson|Wiley")),
    aes(fill = name_clean),
    width = 0.9,
  ) +
  geom_hline(yintercept = 0, linewidth = 1.5) +
  geom_text(
    aes(
      y = abs(profit_margin),
      label = perc_label,
      color = ifelse(profit_margin > 0.04, "white", "black")
    ),
    size = 3,
    nudge_y = ifelse(data_income |>
                       group_by(name_clean) |>
                       slice_max(year) |>
                       ungroup() |>
                       pull(profit_margin) > 0.04, -0.025, 0.025),
    lineheight = 1, fontface = "bold", family = "custom"
  ) +
  labs(
    title = "Elsevier and Springer-Nature had profit margins<br>
      in 2023 that rival those of large tech companies",
    subtitle = "Springer-Nature financials are from 2022 since they haven't released their 2023 results yet",
    fill = NULL,
    caption = "**Source**: Yahoo Finance, RELX Investor Presentations,
      Springer Nature Annual Reports"
  ) +
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
  scale_y_continuous(
    limits = c(NA, 0.4),
    labels = scales::label_percent(),
    expand = expansion(add = c(0, NA))
  ) +
  scale_color_identity() +
  scale_fill_manual(values = pub_cols) +
  ggthemes::theme_fivethirtyeight(base_family = "custom") +
  theme(
    legend.position = "none",
    plot.title = element_markdown(),
    plot.subtitle = element_markdown(size = 10, margin = margin(b = 20)),
    plot.title.position = "plot",
    plot.caption = element_markdown(margin = margin(t = 20)),
    plot.caption.position = "plot",
    axis.text.x = element_markdown(),
    panel.grid.major.x = element_blank(),
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
  )

These numbers were not exceptional for 2023, these profit margins have remained fairly stable in at least the past few years.

Show code for the plot

data_income |>
  mutate(
    company_label = str_glue("**{symbol}** - {name_clean}"),
    company_label = str_replace(company_label, "Elsevier$", "Elsevier (part of **RELX**)")
  ) |>
  ggplot(aes(x = year, y = profit_margin, group = symbol)) +
  geom_hline(yintercept = 0, linewidth = 1) +
  geom_path(linewidth = 2, alpha = 0.33) +
  geom_path(
    data = . %>% filter(str_detect(name_clean, "RELX|Springer|Pearson|Wiley")),
    aes(color = name_clean),
    linewidth = 2, lineend = "round"
  ) +
  geom_point(
    data = . %>% group_by(symbol) %>% slice_max(year),
    aes(fill = company_label),
    shape = NA
  ) +
  ggrepel::geom_text_repel(
    data = . %>% group_by(symbol) %>% slice_max(year),
    aes(label = symbol),
    direction = "y", nudge_x = 0.1, min.segment.length = 0,
    hjust = 0, size = 3, family = "custom",
    seed = 42
  ) +
  labs(
    title = "2023 was no outlier, scientific publishers have had<br>consistent profit margins for the past few years",
    fill = NULL,
    caption = "**Source**: Yahoo Finance, RELX Investor Presentations, Springer Nature Annual Reports"
  ) +
  scale_x_continuous(
    expand = expansion(add = c(0, 0.2))
  ) +
  scale_y_continuous(
    limits = c(NA, 0.45),
    labels = scales::label_percent()
  ) +
  scale_color_manual(values = pub_cols, guide = "none") +
  scale_fill_discrete(
    guide = guide_legend(
      override.aes = list(color = "transparent"),
      nrow = 3
    )
  ) +
  ggthemes::theme_fivethirtyeight(base_family = "custom") +
  theme(
    legend.position = "bottom",
    legend.text = element_markdown(),
    legend.key = element_blank(),
    plot.title = element_markdown(),
    plot.title.position = "plot",
    plot.caption = element_markdown(),
    plot.caption.position = "plot",
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    legend.background = element_rect(fill = "white")
  )

Some institutions like the Karolinska Institutet even publish a list counting the number of publications in high-impact journals (the threshold at Karolinska is 15.255 or higher). Institutions are very happy with high impact publications since they can use it to boast their successes. So how do journals deal with this? Do high-impact journals also feel comfortable asking a higher processing fee?

Show code for the plot

jif_joi <- c("Nature", "Nature Medicine", "Nature Mental Health", "Nature Genetics",
             "Nature Communications", "Nature Geoscience", "Cell", "Immunity",
             "Scientific Reports", "Neuron", "BMC Biology", "The Lancet", "Science",
             "NeuroImage", "Artificial Intelligence", "Cell Systems", "Engineering Structures")

data_jif <- read_csv("./data/journal_impact_factors_2022.csv") |>
  mutate(
    journal_name = str_to_lower(journal_name)
  ) |>
  inner_join(data_apcs |>
               mutate(title_lower = str_to_lower(title)),
             by = c("journal_name" = "title_lower")) |>
  select(journal_name, title, jif_2022, usd) |>
  arrange(-usd, -jif_2022, -usd)

data_jif |>
  drop_na(jif_2022, usd) |>
  ggplot(aes(x = usd, y = jif_2022)) +
  geom_jitter(alpha = 1, stroke = 0) +
  geom_rug(alpha = 0.2) +
  ggpmisc::stat_poly_line(
    formula = y ~ poly(x, 3),
    color = "maroon", fill = "maroon"
  ) +
  ggpmisc::stat_poly_eq(
    formula = y ~ poly(x, 3),
    ggpmisc::use_label("eq"),
    label.x.npc = 0.05, label.y.npc = 0.85,
    color = "grey30", size = 4, family = "custom"
  ) +
  ggpmisc::stat_poly_eq(
    formula = y ~ poly(x, 3),
    ggpmisc::use_label(c("r2", "p")),
    label.x.npc = 0.05, label.y.npc = 0.9,
    color = "grey30", size = 4, family = "custom"
  ) +
  ggrepel::geom_label_repel(
    data = data_jif |> filter(title %in% jif_joi),
    aes(label = title),
    min.segment.length = 0, point.padding = 0.3,
    size = 3, color = "grey30",
    family = "custom", max.overlaps = 12,
    seed = 42,
  ) +
  labs(
    title = "Association between Journal Impact Factor and Article Processing Fee",
    x = "Open Access Article Processing Fee (APC)",
    y = "Journal Impact Factor (JIF)"
  ) +
  scale_x_continuous(
    limits = c(0, NA),
    labels = scales::label_dollar(big.mark = " "),
    expand = expansion(add = c(0, 600))
  ) +
  ggthemes::theme_fivethirtyeight(base_family = "custom") +
  theme(
    legend.position = "bottom",
    legend.text = element_markdown(),
    legend.key = element_blank(),
    plot.title = element_markdown(),
    plot.title.position = "plot",
    plot.caption = element_markdown(),
    plot.caption.position = "plot",
    axis.title = element_markdown(size = 14),
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    legend.background = element_rect(fill = "white")
  )

So it seems there is an association. In particular the Nature Portfolio journals seem to drive a lot of the values up. It is perhaps important to note as a caveat that a journal’s impact factor is also affected by their nicheness. You’ll note a journal with a quite wide target audience like Nature Medicine is all the way in the top right of the graph, while a journal very important in neuroimaging research but with little reach outside of that group like NeuroImage appears further to the left.

So, on to the more speculative area of the rabbit hole. How did we get here?

Lobbying

I’m yet to meet an academic (I know they’re out there, I just haven’t met them) who doesn’t agree that the publishing system as it is now has major flaws, so why does it persist? It shouldn’t be impossible for a group of scientists collectively to call on politicians and funding agencies to prevent public money flowing to private companies? Apart from the most powerful and high-ranking scientists in universities who have made their career succesfully riding the vicious cycle in the first diagram, I would imagine efforts like these could rely on wide support on all sides of the political spectrum. The people that benefit the most from the current system are the publishing companies and their shareholders, and in a capitalist society where political power correlates with the size of your pocket book, companies and wealthy shareholders can tilt the scales of public policy through heavy lobbying. At this point I will say that the EU recently has on occasion done a decently good job standing up to some of the wealthiest and most powerful tech and finance companies, but falls yet short on a number of other areas. We should probably forgive them for not making scientific publishers a larger target given the niche electorate aware of the issues in this space, but conversely, the EU is for sure a prime target for lobbying efforts from scientific publishing companies. And they’ll have a good story to tell too about “open science” and access to the people, conveniently forgetting to mention that their approach to “open science” will benefit them a great deal more than the average tax payer so I feel we’re entitled to question the honesty of their intention here.

So let’s have a look at the lobbying efforts from these companies, how much do they lobby and how do they stack up against other companies and organizations that lobby at the EU. For this the EU has an open registry where lobbyists have to register the clients they represent and log their interests and meetings with European Commission members.

For comparison, the Red Cross (IFRC) had 27 registered meetings since 2015

Here’s the EU Commission lobby registry entry for scientific publishers I could find in their registry:

RELX (23 meetings since 2015)
Springer Nature AG & Co. KGaA (12 meetings since 2015)
Pearson (2 meetings since 2015)
Wiley (no meetings since 2015)

In 2022, the Norwegian parliament (Stortinget) voted against implementing a lobby register

In 2022, the RELX Group (OpenSecrets.org source) was the biggest US lobbying client in the Printing & Publishing sector by money spent (source), on par with other companies like JPMorgan Chase (source), Ford (source), and Intuit (source) which is known for it’s TurboTax service (see here for a profile by Hasan Minhaj on Intuit’s lobbying efforts to get an idea of what this amount of lobbying can get you). Other publishers such as Springer-Nature, Pearson, and Wiley have also lobbied for several hundreds of thousands of dollars in the past years but the amount of money spent is much less for these other publishers.

Of course, RELX is a large company that includes more than “just” Elsevier. RELX deals in data analytics and risk management (for example through fraud detection and anti-money laundering tools) and a number of other business areas. My assumption is that a large proportion of these millions will go into these other areas as well. Nonetheless, RELX is registered in the “Printing & Publishing” category.

What Can Universities Do?

So far we’ve looked mostly at big capitalist corporations and their influence, but they wouldn’t make these profit margins without research to publish. While the big publishers might hold the monopoly on scientific journals that to some degree control many scientists’ careers, a monopoly also requires a consumer. In 2019 the Norwegian universities and research units collectively decided not to renew a contract with Elsevier over concerns about Elsevier’s stance towards open science. This breach followed a series of other negotiation efforts that led to canceled contracts in Germany, Hungary, and the University of California (source). Norwegian representatives of the universities reached an 9 million euro agreement later that reinstated access for Norwegian researchers to Elsevier articles and included that the few few thousand articles will be published open access. Ease of use is definitely something that big publishers have made possible, and without them scientists will have to work harder to ensure that latest versions and peer-reviews are published freely and in a manner that makes them easy to find. However, it’s not like the current publication process saves researchers any time or energy given the extensive requirements by a large number of journals.

Currently, for a number of reasons, academics still evaluate the quality of the research to a large degree on the prominence of the journal the article appeared in. I think it’s only natural to search for easy metrics to judge an article in the absence of sufficient time or energy to read entire articles (while I think everyone agrees this should be the default). And especially in a competitive context like who gets to have funding, using easy metrics like impact factors and number of publications is an easy way to justify funding one project over another. But this serves only the bureaucracy and show a certain cynicism from the funding agencies towards the trust they lend to the boards that evaluate funding applications. While I do agree that accountability is lacking in current funding decisions, using cheap metrics does not address this issue. It only serves to let the funding agencies of the hook since they can always refer to the metrics when their decisions are questioned. The same applies to the hiring process of new researchers and allocation of professorships. These factors combined severely play into the monopoly that currently allows these profit margins. We are not innocent here.

So What Do We Do Next?

I think we have established that there is a co-dependent relationship between scientists and the publishing industry. So what do we need to do to end this toxic relationship and do practice some self love? We could require all articles to be submitted to non-profit preprint servers (like arXiv or OSF) that will ensure that the research is publicly available. This could then mean that the articles published in the scientific journals could be behind a paywall. But this solution would depend on all scientists to update the preprints to the accepted version of the article where the copyright often still lies with the author. Given the work academics do currently in preparation for publication (or even submission) for big journals, I don’t believe it’s a big ask to publish their articles for publishing on a preprint server. More importantly, it also means that this preprint is easily accessible to other scientists to find, read, and cite. Unfortunately, having articles permanently behind a paywall may give power to publishers to ask higher fees for access to these articles for scientists. In this system, universities would either pay extra for the open access fees or the subscription fees, and usually both.

If and when I can find some good data of this I’ll include some analyses here

Perhaps a more useful target is the use of impact factors in deciding the “quality” and “importance” of a researcher. While the number of publications in high-impact journals is an easy way to deduce the importance and quality of a scientists’ work. These publications are often reliant on projects with abundant resources (both money and time) both of which are not commonly available to early career scientists. Hence these projects are often attached to senior researchers who have worked and navigated successfully the publish-or-perish cycle. It not unreasonable to expect these more senior scientists to not want to introduce fractures in the system that has worked quite well for them so far. My personal preference would be for funding agencies to start their own non-profit journals, where the cost of publication is budgeted as part of the cost of doing science. Now, this plan is not immune from a number of critisism. While these journals would (and should) not need an impact factor, by virtue of the value of money a shadow impact factor would arise anyway. The “Journal from the Norwegian Research Council”, funded in part by Norway’s oil profits would have more resources to spend on research, which in turn would allow for more ambitious projects with larger samples and utilizing more expensive technologies. This would almost inevitably mean that the articles in this journal and the journal of “Journal of NIH-funded Research” would (presumably) be more ambitious than those of less endowed nations. Perhaps an common EU repository of science mqy aid this, but having such a large institution would come with its own set of advantages and disadvantages. I don’t have an immediate suggestion on how to fix or circumvent this issue. However, no solution that isn’t perfect is worse than the current situation. Le mieux est le mortel ennemi du bien. The system will likely not change anytime soon without a serious uprising of academics that are willing to force active changes to how science is valued and evaluated or force political powers to enforce stricter rules on how much public money can flow to private interests (a boy can dream).

All in all, I hope this was a useful summary of the toxic relationship between academia and the publishing industry. How money flows between public institutions and private companies, and how big of a change is necessary to break this relationship. And finally that academia needs an intervention and finally break up with their toxic boyfriend that is RELX.

FAQ

Q - How much have you personally paid in APCs?

We published freely accesible preprints for each of the articles on the preprint server medRxiv

A - I have published three articles where I was the lead author, one in Translational Psychiatry (APC of €3170 at the time), one in BMC Psychiatry (APC of $3190 in 2023), and one in Nature Mental Health (APC of $11690 in 2023). Scientists don’t pay for the APCs personally (I seriously hope this is true anywhere), the project does. Of these three articles, the project I was hired in only paid the APC of the first article (€3170). The third article (published second) was covered by a collective agreement where the funding agency paid for a collection of articles. The second article (published last) did not fall under such an agreement, but we ensured that the preprint (which we published for all three articles) was up to date with the version published in the journal, so we would argue that the content of our research is publicly available and saved the money for more important expenses. The expense could pay for the salary of a student for a few months, which seems like a better use of public money to me.

Q - Aren’t there any good journals that are run by universities or non-profits directly?

eLife is a journal published by registered non-profit eLife Sciences Publications Ltd. Initial funding came from a group of public institutions like the Howard Hughes Medical Institute, the Max Planck Institute, and the Wellcome Trust (Wiki). All research is published open access and their APC is currently $2000 (source). This journal is fairly successful so far, but isn’t immune from controversies, both about a new publishing plan and the firing of their Editor-in-Chief after he called out the indifference to Palestinian lives and praised a satirical article from The Onion on the genocide in Gaza.

Q - Perhaps unrelated, but what is the total compensation for director of the Society for Neuroscience (SfN)?

A - In 2022, Martin Saggese earned a little over $950 000 in his role as Executive Director (ProPublica source) for the non-profit organization, which is about 4% of SfN’s total revenue, which seems fair… 🙃

Acknowledgements

I would like to thank my friends and (former) colleagues on providing feedback on various drafts of this blogpost. They have received compensation in the form of a beverage at our next dinner.