Python > Data Science and Machine Learning Libraries > Natural Language Processing (NLP) with NLTK and spaCy > Named Entity Recognition
Named Entity Recognition with spaCy
This code demonstrates how to use spaCy for Named Entity Recognition (NER). SpaCy is a powerful and efficient NLP library that excels at NER tasks. This snippet loads a pre-trained spaCy model and uses it to identify and classify named entities within a text.
Installation
Before running the code, you need to install spaCy and download a suitable pre-trained model. The first line installs the spaCy library. The second line downloads the 'en_core_web_sm' model, a small English model optimized for efficiency. Larger models like 'en_core_web_lg' offer higher accuracy but require more resources.
pip install spacy
python -m spacy download en_core_web_sm
Code Implementation
This code snippet loads the 'en_core_web_sm' spaCy model, processes a sample text, and then iterates through the identified entities. For each entity, it prints the text of the entity and its corresponding label (e.g., ORG for organization, GPE for geopolitical entity, MONEY for monetary value).
import spacy
# Load a pre-trained spaCy model
nlp = spacy.load('en_core_web_sm')
text = "Apple is looking at buying U.K. startup for $1 billion"
# Process the text with the spaCy model
doc = nlp(text)
# Iterate through the entities and print their text and label
for ent in doc.ents:
print(ent.text, ent.label_)
Explanation
spacy.load('en_core_web_sm')
loads the pre-trained English model. This model contains vocabulary, syntax, and entity recognition data.nlp(text)
applies the model to the input text, performing tokenization, part-of-speech tagging, dependency parsing, and NER. The result is a Doc
object containing all the processed information.doc.ents
, which is a sequence of Span
objects, each representing a named entity.ent.text
gives the text of the entity, and ent.label_
provides its label (e.g., 'ORG', 'GPE', 'MONEY').
Output
The output of this code will be:
Apple ORG
U.K. GPE
$1 billion MONEY
This shows that spaCy correctly identified 'Apple' as an organization, 'U.K.' as a geopolitical entity, and '$1 billion' as a monetary value.
Real-Life Use Case
NER has numerous real-world applications. For example, in news article analysis, it can be used to identify key people, organizations, and locations mentioned in an article. In customer service, it can extract product names, dates, and issue types from customer inquiries. In finance, it can extract company names and monetary values from financial reports. It can also be used to improve search engine accuracy by understanding the intent of the user's query.
Best Practices
When to Use Them
Use NER when you need to automatically identify and classify named entities within text. This is useful for tasks such as information extraction, text summarization, and question answering.
Memory footprint
The memory footprint depends on the model you use. 'en_core_web_sm' is relatively small and efficient, while larger models like 'en_core_web_lg' require significantly more memory.
Alternatives
Alternatives to spaCy for NER include NLTK, Stanford NER, and Flair. NLTK is a more general-purpose NLP library, while Stanford NER and Flair are specialized NER tools. SpaCy is generally preferred for its speed and ease of use.
Pros
Cons
FAQ
-
What are the common entity types that spaCy recognizes?
SpaCy's pre-trained models typically recognize entity types such as PERSON (people), ORG (organizations), GPE (geopolitical entities), DATE (dates), TIME (times), MONEY (monetary values), and more. -
How can I train a custom NER model with spaCy?
You can train a custom NER model with spaCy by preparing training data in the spaCy format, configuring a training pipeline, and using the `spacy train` command. Refer to the spaCy documentation for detailed instructions.