|
12 | 12 |
|
13 | 13 | \title{Python Machine Learning\\ Equation Reference} |
14 | 14 | \author{Sebastian Raschka \\ \texttt{mail@sebastianraschka.com}} |
15 | | -\date{ \vspace{2cm} 05\slash 04\slash 2015 (last updated: 06\slash 15\slash 2016) \\\begin{flushleft} \vspace{2cm} \noindent\rule{10cm}{0.4pt} \\ Code Repository and Resources:: \href{https://github.com/rasbt/python-machine-learning-book}{https://github.com/rasbt/python-machine-learning-book} \vspace{2cm} \endgraf @book\{raschka2015python,\\ |
| 15 | +\date{ \vspace{2cm} 05\slash 04\slash 2015 (last updated: 06\slash 19\slash 2016) \\\begin{flushleft} \vspace{2cm} \noindent\rule{10cm}{0.4pt} \\ Code Repository and Resources:: \href{https://github.com/rasbt/python-machine-learning-book}{https://github.com/rasbt/python-machine-learning-book} \vspace{2cm} \endgraf @book\{raschka2015python,\\ |
16 | 16 | title=\{Python Machine Learning\},\\ |
17 | 17 | author=\{Raschka, Sebastian\},\\ |
18 | 18 | year=\{2015\},\\ |
@@ -333,12 +333,13 @@ \subsection{Minimizing cost functions with gradient descent} |
333 | 333 | \begin{split} |
334 | 334 | & \frac{\partial J}{\partial w_j} = \frac{\partial}{\partial w_j} \frac{1}{2} \sum_i \bigg( y^{(i)} - \phi \big( z^{(i)} \big) \bigg)^2 \\ |
335 | 335 | & = \frac{1}{2} \frac{\partial}{\partial w_j} \sum_i \bigg( y^{(i)} - \phi \big( z^{(i)} \big) \bigg)^2 \\ |
336 | | -& = \frac{1}{2} \sum_i 2 \bigg( y^{(i)} - \phi \big( z^{(i)} \big) \bigg) \frac{\partial J}{\partial w_j} \Bigg( y^{(i)} - \sum_i \bigg( w_{j}^{(i)} x_{j}^{(i)} \bigg)\Bigg) \\ |
| 336 | +& = \frac{1}{2} \sum_i 2 \big( y^{(i)} - \phi(z^{(i)})\big) \frac{\partial}{\partial w_j} \Big( y^{(i)} - \phi({z^{(i)}}) \Big) \\ |
| 337 | +& = \sum_i \big( y^{(i)} - \phi (z^{(i)}) \big) \frac{\partial}{\partial w_j} \Big( y^{(i)} - \sum_i \big(w^{(i)}_{j} x^{(i)}_{j} \big) \Big) \\ |
337 | 338 | & = \sum_i \bigg( y^{(i)} - \phi \big( z^{(i)} \big) \bigg) \bigg( - x_{j}^{(i)} \bigg) \\ |
338 | 339 | & = - \sum_i \bigg( y^{(i)} - \phi \big( z^{(i)} \big) \bigg) x_{j}^{(i)} \\ |
339 | 340 | \end{split} |
340 | 341 | \end{equation*} |
341 | | - |
| 342 | +? |
342 | 343 | Performing a matrix-vector multiplication is similar to calculating a vector dot product where each row in the matrix is treated as a single row vector. This vectorized approach represents a more compact notation and results in a more efficient computation using NumPy. For example: |
343 | 344 |
|
344 | 345 | \[ |
@@ -409,7 +410,7 @@ \subsection{Logistic regression intuition and conditional probabilities} |
409 | 410 | event. The term positive event does not necessarily mean good, but refers to the event that we want to predict, for example, the probability that a patient has a certain disease; we can think of the positive event as class label $y =1$. We can then further define the logit function, which is simply the logarithm of the odds ratio (log-odds): |
410 | 411 |
|
411 | 412 | \[ |
412 | | -logit(p) = log \frac{p}{1-p} |
| 413 | +logit(p) = \log \frac{p}{1-p} |
413 | 414 | \] |
414 | 415 |
|
415 | 416 | The logit function takes input values in the range 0 to 1 and transforms them to values over the entire real number range, which we can use to express a linear relationship between feature values and the log-odds: |
@@ -1180,6 +1181,35 @@ \section{Summary} |
1180 | 1181 |
|
1181 | 1182 |
|
1182 | 1183 |
|
| 1184 | +%%%%%%%%%%%%%%% |
| 1185 | +% CHAPTER 6 |
| 1186 | +%%%%%%%%%%%%%%% |
| 1187 | + |
| 1188 | +\chapter{Learning Best Practices for Model Evaluation and Hyperparameter Tuning} |
| 1189 | + |
| 1190 | +\section{Streamlining workflows with pipelines} |
| 1191 | +\subsection{Loading the Breast Cancer Wisconsin dataset} |
| 1192 | +\subsection{Combining transformers and estimators in a pipeline} |
| 1193 | +\section{Using k-fold cross-validation to assess model performance} |
| 1194 | +\subsection{The holdout method} |
| 1195 | +\subsection{K-fold cross-validation} |
| 1196 | +\section{Debugging algorithms with learning and validation curves} |
| 1197 | +\subsection{Diagnosing bias and variance problems with learning curves} |
| 1198 | +\subsection{Addressing overfitting and underfitting with validation curves} |
| 1199 | +\section{Fine-tuning machine learning models via grid search} |
| 1200 | +\subsection{Tuning hyperparameters via grid search} |
| 1201 | +\subsection{Algorithm selection with nested cross-validation} |
| 1202 | +\section{Looking at different performance evaluation metrics} |
| 1203 | +\subsection{Reading a confusion matrix} |
| 1204 | +\subsection{Optimizing the precision and recall of a classification model} |
| 1205 | + |
| 1206 | +\[ |
| 1207 | +ERR = \frac{FP + FN}{FP + FN + TP + TN} |
| 1208 | +\] |
| 1209 | + |
| 1210 | +\subsection{Plotting a receiver operating characteristic} |
| 1211 | +\subsection{The scoring metrics for multiclass classification} |
| 1212 | +\section{Summary} |
1183 | 1213 |
|
1184 | 1214 |
|
1185 | 1215 | \newpage |
|
0 commit comments