R/explore_rdname.R
, R/explore.R
, R/explore0.R
explore.Rd
Launch Shiny app for exploration of text collection. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
explore()
explores a 'corporaexplorerobject'
created with the prepare_data()
function.
App settings optionally specified in
the arguments to explore()
.
explore0()
is a convenience function to directly explore
a data frame or character vector
without first creating a corporaexplorerobject using
prepare_data()
, instead creating one on the fly as the app
launches.
Functionally equivalent to
explore(prepare_data(dataset, use_matrix = FALSE))
.
A corporaexplorerobject created by
prepare_data
.
List. Specify how search operations in the app are carried out. Available options:
use_matrix
Logical. If the corporaexplorerobject contains a document
term matrix, should it be used for searches? (See
prepare_data
.) Defaults to TRUE
.
regex_engine
Character. Specify regular expression engine to be used
(defaults to "default"
). Available options:
"default": use the re2
package
(https://github.com/girishji/re2) for simple searches and the
stringr
package (https://github.com/tidyverse/stringr for
complex regexes (i.e. when special regex characters are used).
"stringr": use stringr
for all searches.
"re2": use re2
for all searches.
optional_info
Logical. If TRUE
, information about search method
(regex engine and whether the search was conducted in the document term
matrix or in the full text documents).
allow_unreasonable_patterns
Logical. If FALSE
, the default, the app will
not allow patterns that will result in an enormous amount of hits or will
lead to a very slow search. (Examples of such patterns will include
'.
' and '\b
'.)
List. Specify custom app settings (see example below). Currently available:
font_size
. Character string specifying font size in
document view,
e.g. "10px"
List. Gives the opportunity to pre-populate the following sidebar fields (see example below):
search_terms
: The 'Term(s) to chart and highlight' field.
Character vector with maximum length 5.
highlight_terms
: The 'Additional terms for text highlighting' field.
Character vector.
filter_terms
: The 'Filter corpus?' field. Character vector.
case_sensitivity
: Should the 'Case sensitive search' box
be checked? Logical.
List. Specify custom plot settings (see example below). Currently available:
max_docs_in_wall_view
. Integer specifying the maximum number
of documents to be rendered in the 'document wall' view.
Default value is 12000.
plot_size_factor
. Numeric. Tweaks the corpus map plot's
height. Value > 1 increases height, value < 1 decreases height.
Ignored if value <= 0.
documents_per_row_factor
. Numeric. Tweaks the number of
documents included in each row in 'document wall' view. Value > 1
increases number of documents, value < 1 decreases number of
documents. Ignored if value <= 0.
document_tiles
. Integer specifying the number of tiles
used in the tile chart representing occurences of terms in document.
Ignored if value < 1 or if value > 50.
colours
. Character vector of length 1 to 6. Specify the
order of the colours used to represent search (and highlight) terms
in plots and documents. The default order and available colours are
defined by the character vector
c("red", "blue", "green", "purple", "orange", "gray")
.
Passing e.g. plot_options = list(colours = c("gray", "green"))
will change that order to
c("gray", "green", "red", "blue", "purple", "orange")
.
Arguments with duplicated colours or with colours not present in the
default character vector will be ignored.
tile_length
. Either "scaled"
or "uniform"
.
With "scaled"
, the default, the length of the
tiles in document wall view and day corpus view
will vary according to length of document
(see the tile_length_range
argument in prepare_data()
).
If "uniform"
, all tiles will be of equal length.
Other arguments passed to runApp
in the Shiny
package.
Data frame or character vector as specified in prepare_data()
List. Arguments to be passed to
prepare_data()
in order to override this function's
default argument values.
List. Arguments to be passed to
explore()
in order to override this function's
default argument values.
Launches a Shiny app.
For explore0()
:
by default, no document term matrix will be generated,
meaning that the data will be prepared for exploration faster than
by using the default settings in prepare_data()
,
but also that searches in the app are likely to be slower.
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
"This is a document about ", month.name[1:10], ". ",
"This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)
# Converting to corporaexplorerobject:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")
#> Starting.
#> Document data frame done.
#> Calendar data frame done.
#> Document term matrix: text processed.
#> Document term matrix: tokenising completed.
#> Document term matrix: word list created.
#> Document term matrix done.
#> Done.
if(interactive()){
# Running exploration app:
explore(corpus)
explore(corpus,
search_options = list(optional_info = TRUE),
ui_options = list(font_size = "10px"),
search_input = list(search_terms = c("Tottenham", "Spurs")),
plot_options = list(max_docs_in_wall_view = 12001,
colours = c("gray", "green")))
# Running app to extract documents:
run_document_extractor(corpus)
}
if (interactive()) {
explore0(rep(sample(LETTERS), 10))
explore0(rep(sample(LETTERS), 10),
arguments_explore = list(search_input = list(search_terms = "Z"))
)
}