Highlight the Pipe. Pkgdown

2017-10-29

rstats pkgdown

Practical advices about customizing code highlighting on web pages created with pkgdown.

Prologue

It felt really nice to achieve custom code highlighting on this site with highlight.js (see this post). After that, I found myself working with pkgdown, one of many great Hadley’s packages. It is “designed to make it quick and easy to build a website for your package”. It converts all package documentation into appropriate HTML pages. Naturally, I had the same question as before: is there an easy way to highlight pipe operator %>% separately? This time the best answer I was able to come up with was “Yes, if you don’t mind some hacking.”

This post is about adding custom rules for code highlighting for pkgdown site, taking string %>% as an example.

Overview

After looking into HTML code of site built with pkgdown, I noticed next key features of code highlighting:

  • Text is already parsed with appropriate strings wrapped in <span></span>. This is done during building site with pkgdown::build_site(). Class attribute of <span> is used to customize highlighting.
  • Code from reference pages is processed differently. For example, function mean is wrapped as <span class="kw">mean</span> in Home page but <span class='fu'>mean</span> in Reference.
  • The most valuable feature of code preprocessing is creating links to appropriate help pages for R functions. This is done with adding <a> tag inside <span> for certain function name.

So the default method of customising code highlighting in pkgdown is to define CSS styles for present classes (which are essentially different across site).

To highlight certain strings, such as %>%, one should parse HTML for certain <span> tags inside <pre> node (tag for preformatted text used for separate code blocks) and add appropriate class for further CSS customisation. This path is described in With adding tag class.

Although this method solves the problem of highlighting the %>%, it is somewhat constrained: one can’t customize parsing rules. For example, there is no easy way to highlight <- differently because it is not wrapped in <span>. I thought it would be better to reuse the existing solution with highlight.js, but I didn’t consider this path for some time because of preformatted nature of code (unlike my previous experience) and concerns about function links to disappear. However, after manually adding necessary JavaScript code, it worked! Well, kind of: reference pages were not highlighted. The good news was that links stayed in place. How to add appropriate JavaScript code to pkgdown site and deal with reference pages is described in With highlight.js

All code and short version of how to use it is placed in my highdown package.

With adding tag class

The plan is pretty straightforward:

  • Find all HTML pages to add tag classes.
  • At each page find appropriate tags, i.e. <span> inside <pre> with text satisfying desired condition.
  • Add certain class to that tags.
  • Modify CSS file.

Add class

The following functions do the job of adding class to appropriate tags. Package xml2 should be installed.

Main function arguments are:

  • xpath - String containing an xpath (1.0) expression (use "//pre//span" for code highlighting tags).
  • pattern - Regular expression for tags’ text of interest.
  • new_class - String for class to add.
  • path - Path to folder with html files (default to “docs”).
xml_add_class_pattern <- function(xpath, pattern, new_class, path = "docs") {
  # Find HTML pages
  html_files <- list.files(
    path = "docs",
    pattern = "\\.html",
    recursive = TRUE,
    full.names = TRUE
  )

  lapply(html_files, function(file) {
    page <- xml2::read_html(file, encoding = "UTF-8")

    matched_nodes <- xml_find_all_patterns(page, xpath, pattern)
    if (length(matched_nodes) == 0) {
      return(NA)
    }

    xml_add_class(matched_nodes, new_class)

    xml2::write_html(page, file, format = FALSE)
  })

  invisible(html_files)
}

# Add class `new_class` to nodes
xml_add_class <- function(x, new_class) {
  output_class <- paste(xml2::xml_attr(x, "class"), new_class)
  mapply(xml2::xml_set_attr, x, output_class, MoreArgs = list(attr = "class"))

  invisible(x)
}

# Find appropriate tags
# To find <span> inside <pre> use `xpath = "\\pre\\span"`.
xml_find_all_patterns <- function(x, xpath, pattern, ns = xml2::xml_ns(x)) {
  res <- xml2::xml_find_all(x, xpath, ns)
  is_matched <- grepl(pattern, xml2::xml_text(res))

  res[is_matched]
}

For convenience one can define function high_pipe() for adding class pp to all <span> inside <pre> with text containing %>%:

high_pipe <- function(path = "docs", new_class = "pp") {
  xml_add_class_pattern("//pre//span", "%>%", new_class, path)
}

So typical usage is as follows:

  • Run pkgdown::build_site().
  • Run highdown::high_pipe() (with working directory being package root).

Add custom CSS rules

For adding custom CSS rules in pkgdown site create file pkgdown/extra.css in package root and edit it. For example, to make %>% bold write the following:

.pp {font-weight: bold;}

With highlight.js

Highlight.js enables more flexible code highlighting. For its overview and customization see my previous post.

Add custom JavaScript

To add custom JavaScript code to pkgdown site one should create and modify file pkgdown/extra.js in package root. Go here for code that initializes highlight.js and registers default R language parsing rules.

Tweak reference page

For highlight.js to work, code should be wrapped in <pre><span class="r"> tags. However, reference pages use only <pre>. To tweak these pages use the following function (with working directory being package root):

tweak_ref_pages <- function() {
  # Find all reference pages
  ref_files <- list.files(
    path = "docs/reference/",
    pattern = "\\.html",
    recursive = TRUE,
    full.names = TRUE
  )

  lapply(ref_files, add_code_node)

  invisible(ref_files)
}

add_code_node <- function(x) {
  page <- paste0(readLines(x), collapse = "\n")

  # Regular expression magic for adding <code class = "r"></code>
  page <- gsub('(<pre.*?>)', '\\1<code class = "r">', page)
  page <- gsub('<\\/pre>', '<\\/code><\\/pre>', page)

  invisible(writeLines(page, x))
}

Note that as for 2017-10-27 this still can cause incorrect highlighting if some actual code is placed just after comment.

Add highlight.js CSS rules

Edit pkgdown/extra.css for highlight.js classes. For template with Idea style along with R default classes look here.

Conclusions

  • It is confirmed that asking questions about seemingly simple task can lead to the long journey of code exploration and hacking.
  • At first try to find a way to reuse existing solutions, if they satisfy your needs. It can save considerable amount of time in the future.
  • With highdown it is straightforward to customise code highlighting of pkgdown sites.
sessionInfo()
sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=ru_UA.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ru_UA.UTF-8        LC_COLLATE=ru_UA.UTF-8    
##  [5] LC_MONETARY=ru_UA.UTF-8    LC_MESSAGES=ru_UA.UTF-8   
##  [7] LC_PAPER=ru_UA.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ru_UA.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.2  backports_1.1.1 bookdown_0.5    magrittr_1.5   
##  [5] rprojroot_1.2   tools_3.4.2     htmltools_0.3.6 yaml_2.1.14    
##  [9] Rcpp_0.12.13    stringi_1.1.5   rmarkdown_1.7   blogdown_0.2   
## [13] knitr_1.17      stringr_1.2.0   digest_0.6.12   evaluate_0.10.1

Statistical uncertainty with R and pdqr

2019-11-11

rstats pdqr

Local randomness in R

2019-08-13

rstats

Arguments of stats::density()

2019-08-06

rstats pdqr

comments powered by Disqus