Skip to content

Commit 578adfd

Browse files
committed
eq ch 8
1 parent db9cfea commit 578adfd

2 files changed

Lines changed: 51 additions & 1 deletion

File tree

docs/equations/pymle-equations.pdf

4.96 KB
Binary file not shown.

docs/equations/pymle-equations.tex

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ \section{An introduction to the basic terminology and notations}
8181

8282
For the rest of this book, unless noted otherwise, we will use the superscript $(i)$ to refer to the $i$th training sample, and the subscript $j$ to refer to the $j$th dimension of the training dataset.
8383

84-
We use lower-case, bold-face letters to refer to vectors ($\mathbf{x} \in \mathbb{R}^{n \times 1}$) and upper-case, bold-face letters to refer to matrices, respectively ($\mathbf{X} \in \mathbb{R}^{n \times m}$), where $n$ refers to the number of rows, and $m$ refers to the number of columns, respectively. To refer to single elements in a vector or matrix, we write the letters in italics ($x^{(n)}) or x^{(n)}_{m}$, respectively). For example, $x^{150}_1$ refers to the refers to the first dimension of the flower sample 150, the sepal length. Thus, each row in this feature matrix represents one flower instance and can be written as four-dimensional row vector $\mathbf{x}^{(i)} \in \mathbb{R}^{1 \times 4}$
84+
We use lower-case, bold-face letters to refer to vectors ($\mathbf{x} \in \mathbb{R}^{n \times 1}$) and upper-case, bold-face letters to refer to matrices, respectively ($\mathbf{X} \in \mathbb{R}^{n \times m}$), where $n$ refers to the number of rows, and $m$ refers to the number of columns, respectively. To refer to single elements in a vector or matrix, we write the letters in italics $x^{(n)}$ or $x^{(n)}_{m}$, respectively. For example, $x^{150}_1$ refers to the refers to the first dimension of the flower sample 150, the sepal length. Thus, each row in this feature matrix represents one flower instance and can be written as four-dimensional row vector $\mathbf{x}^{(i)} \in \mathbb{R}^{1 \times 4}$
8585

8686
\[ \mathbf{x}^{(i)} = \bigg[x^{(i)}_1 \; x^{(i)}_2 \; x^{(i)}_3 \; x^{(i)}_4 \bigg]. \]
8787

@@ -1461,10 +1461,60 @@ \section{Leveraging weak learners via adaptive boosting}
14611461

14621462
Thus, each weight that corresponds to a correctly classified sample will be reduced from the initial value of $0.1$ to $0.065 / 0.914 \approx 0.071$ for the next round of boosting. Similarly, the weights of each incorrectly classified sample will increase from $0.1$ to $0.153 / 0.914 \approx 0.167$.
14631463

1464+
\section{Summary}
1465+
1466+
1467+
%%%%%%%%%%%%%%%
1468+
% CHAPTER 8
1469+
%%%%%%%%%%%%%%%
1470+
1471+
\chapter{Applying Machine Learning to Sentiment Analysis}
1472+
1473+
\section{Obtaining the IMDb movie review dataset}
1474+
\section{Introducing the bag-of-words model}
1475+
\subsection{Transforming words into feature vectors}
1476+
\subsection{Assessing word relevancy via term frequency-inverse document frequency}
1477+
1478+
The \textit{tf-idf} can be de ned as the product of the \textit{term frequency} and the \textit{inverse document frequency}:
1479+
1480+
\[
1481+
\text{tf-idf}(t, d) = \text{tf} (t, d) \times \text{idf}(t, d)
1482+
\]
1483+
1484+
Here the $\text{tf}(t, d)$ is the term frequency that we introduced in the previous section,and the inverse document frequency $\text{idf}(t, d)$ can be calculated as:
1485+
1486+
\[
1487+
\text{idf}(t, d) = \log \frac{n_d}{1 + \text{df}(d, t)},
1488+
\]
14641489

1490+
where $n_d$ is the total number of documents, and $\text{df}(d, t)$ is the number of documents $d$ that contain the term $t$. Note that adding the constant 1 to the denominator is optional and serves the purpose of assigning a non-zero value to terms that occur in all training samples; the log is used to ensure that low document frequencies are not given too much weight.
1491+
1492+
However, if we'd manually calculated the tf-idfs of the individual terms in our feature vectors, we'd have noticed that the \textit{TfidfTransformer} calculates the tf-idfs slightly differently compared to the standard textbook equations that we de ned earlier. The equations for the idf and tf-idf that were implemented in scikit-learn are:
1493+
1494+
\[
1495+
\text{idf}(t, d) = \log \frac{1 + n_d}{1 + \text{df}(d, t)}
1496+
\]
1497+
1498+
The tf-idf equation that was implemented in scikit-learn is as follows:
1499+
1500+
\[
1501+
\text{tf-idf(t, d)} = \text{tf}(t, d) \times (\text{idf} (t, d) + 1).
1502+
\]
1503+
1504+
While it is also more typical to normalize the raw term frequencies before calculating the tf-idfs, the \textit{TfidfTransformer} normalizes the tf-idfs directly. By default (\text{norm='l2'}), scikit-learn's \textit{TfidfTransformer} applies the L2-normalization, which returns a vector of length 1 by dividing an un-normalized feature vector $\mathbf{v}$ by its L2-norm:
1505+
1506+
\[
1507+
\mathbf{v}_{norm} = \frac{\mathbf{v}}{\lVert \mathbf{v} \rVert}_2 = \frac{\mathbf{v}}{\sqrt{v_{1}^{2} + v_{2}^{2} + \cdots + v_{n}^{2}}} = \frac{\mathbf{v}}{ \big( \sum_{i=1}^{n} v_{i}^{2} \big)^{1/2} }
1508+
\]
1509+
1510+
\subsection{Cleaning text data}
1511+
\subsection{Processing documents into tokens}
1512+
\section{Training a logistic regression model for document classification}
1513+
\section{Working with bigger data ? online algorithms and out-of-core learning}
14651514
\section{Summary}
14661515

14671516

1517+
14681518
\newpage
14691519

14701520
... to be continued ...

0 commit comments

Comments
 (0)