# [Solved] what is the difference between tfidf vectorizer and tfidf transformer

I know that the formula for `tfidf vectorizer` is

``````Count of word/Total count * log(Number of documents / no.of documents where word is present)
``````

I saw there’s tfidf transformer in the scikit learn and I just wanted to difference between them. I could’t find anything that’s helpful.

## Solution #1:

TfidfVectorizer is used on sentences, while
TfidfTransformer is used on an existing count matrix, such as one returned by CountVectorizer

## Solution #2:

With Tfidftransformer you will compute word counts using CountVectorizer and then compute the IDF values and only then compute the Tf-idf scores. With Tfidfvectorizer you will do all three steps at once.

## Solution #3:

Artem’s answer pretty much sums up the difference.
To make things clearer here is an example as referenced from here.

TfidfTransformer can be used as follows:

``````from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

train_set = ["The sky is blue.", "The sun is bright."]

vectorizer = CountVectorizer(stop_words='english')
trainVectorizerArray =   vectorizer.fit_transform(article_master['stemmed_content'])

transformer = TfidfTransformer()
res = transformer.fit_transform(trainVectorizerArray)

print ((res.todense()))

## RESULT:

Fit Vectorizer to train set
[[1 0 1 0]
[0 1 0 1]]

[[0.70710678 0.         0.70710678 0.        ]
[0.         0.70710678 0.         0.70710678]]
``````

Extraction of count features, TF-IDF normalization and row-wise euclidean normalization can be done in one operation with TfidfVectorizer:

``````from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
res1 = tfidf.fit_transform(train_set)
print ((res1.todense()))

## RESULT:

[[0.70710678 0.         0.70710678 0.        ]
[0.         0.70710678 0.         0.70710678]]
``````

Both processes produce a sparse matrix comprising of the same values.
Other useful references would be tfidfTransformer.fit_transform, countVectoriser_fit_transform and tfidfVectoriser .

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .