Skip to contents

A list of predefined collections of named entity types for use with spaCy's named entity recognition. These collections group entity types by semantic categories, making it easier to extract related types of entities.

Usage

nlp_entity_sets

Format

A named list with the following elements:

all

All available entity types: "CARDINAL", "DATE", "EVENT", "FAC", "GPE", "LANGUAGE", "LAW", "LOC", "MONEY", "NORP", "ORDINAL", "ORG", "PERCENT", "PERSON", "PRODUCT", "QUANTITY", "TIME", "WORK_OF_ART", "PROPN"

cultural_artifacts

Language, laws, products, and works of art: "LANGUAGE", "LAW", "PRODUCT", "WORK_OF_ART"

date_and_time

"DATE", "TIME"

named

Named entities: "EVENT", "FAC", "GPE", "LOC", "NORP", "ORG", "PERSON", "PRODUCT", "PROPN"

numbers

Numeric and quantitative entities: "CARDINAL", "MONEY", "ORDINAL", "PERCENT", "QUANTITY"

organizations

Organizational entities: "NORP", "ORG"

places

Location and place entities: "FAC", "GPE", "LOC"

Details

The entity types follow the OntoNotes 5.0 annotation scheme:

  • CARDINAL: Numerals that do not fall under another type

  • DATE: Absolute or relative dates or periods

  • EVENT: Named hurricanes, battles, wars, sports events, etc.

  • FAC: Buildings, airports, highways, bridges, etc.

  • GPE: Countries, cities, states (geopolitical entities)

  • LANGUAGE: Any named language

  • LAW: Named documents made into laws

  • LOC: Non-GPE locations, mountain ranges, bodies of water

  • MONEY: Monetary values, including unit

  • NORP: Nationalities or religious or political groups

  • ORDINAL: "first", "second", etc.

  • ORG: Companies, agencies, institutions, etc.

  • PERCENT: Percentage, including "%"

  • PERSON: People, including fictional

  • PRODUCT: Objects, vehicles, foods, etc. (not services)

  • QUANTITY: Measurements, as of weight or distance

  • TIME: Times smaller than a day

  • WORK_OF_ART: Titles of books, songs, etc.

  • PROPN: Proper nouns

References

OntoNotes 5.0 Annotation Scheme: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf spaCy Named Entity Recognition: https://spacy.io/usage/linguistic-features#named-entities

Examples

# View all available entity sets
names(nlp_entity_sets)
#> [1] "all"                "cultural_artifacts" "date_and_time"     
#> [4] "named"              "numbers"            "organizations"     
#> [7] "places"            

# View the contents of specific sets
nlp_entity_sets$places
#> [1] "FAC" "GPE" "LOC"
nlp_entity_sets$numbers
#> [1] "CARDINAL" "MONEY"    "ORDINAL"  "PERCENT"  "QUANTITY"

# Use in entity extraction
text <- "Apple Inc. was founded in Cupertino on April 1, 1976."
nlp_get_entities(text, nlp_entity_sets$organizations)
#> [1] "Apple Inc."
nlp_get_entities(text, nlp_entity_sets$places)
#> [1] "Cupertino"
nlp_get_entities(text, nlp_entity_sets$date_and_time)
#> [1] "April 1, 1976"