Each Answer to this Q is separated by one/two green lines.
lg refer to the sizes of the models (small, medium, large respectively).
As it says on the models page you linked to,
Model differences are mostly statistical. In general, we do expect larger models to be “better” and more accurate overall. Ultimately, it depends on your use case and requirements. We recommend starting with the default models (marked with a star below).
sm model is the default (as alluded to above)
The difference is in the accuracy of the predictions.
But, as you can see in the comparison in the spaCy documentation, the difference is very small.
en_core_web_lg (788 MB) compared to
en_core_web_sm (10 MB):
- LAS: 90.07% vs 89.66%
- POS: 96.98% vs 96.78%
- UAS: 91.83% vs 91.53%
- NER F-score: 86.62% vs 85.86%
- NER precision: 87.03% vs 86.33%
- NER recall: 86.20% vs 85.39%
All that while
en_core_web_lg is 79 times larger, hence loads a lot more slowly.
What I recommend is using the
en_core_web_sm while developing and then switching to a larger model in production.
You can easily switch just by changing the model you load.
nlp = spacy.load("en_core_web_lg")