R/run_document_extractor.R
run_document_extractor.Rd
This function will be removed in a future version of corporexplorer.
run_document_extractor(corpus_object, max_html_docs = 400, ...)
A corporaexplorer
object created by
prepare_data
.
The maximum number of documents allowed in one HTML report.
Other arguments passed to runApp
in the Shiny
package.
Shiny app for simple retrieval/extraction of documents from a "corporaexplorerobject" in a reading-friendly format. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
"This is a document about ", month.name[1:10], ". ",
"This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)
# Converting to corporaexplorer object:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")
#> Starting.
#> Document data frame done.
#> Calendar data frame done.
#> Document term matrix: text processed.
#> Document term matrix: tokenising completed.
#> Document term matrix: word list created.
#> Document term matrix done.
#> Done.
if(interactive()){
# Running exploration app:
explore(corpus)
# Running app to extract documents:
run_document_extractor(corpus)
}