Regression

Regression predicts numeric values (e.g., house prices, temperatures).

Cf. Classification which predicts discrete values.

Example: House Prices

House Size (sq ft) Num. of Bedrooms Num. of Bathrooms Age of House (years) House Price ($)
1360 4 2 27 251240
1794 1 2 42 262602
1630 4 2 28 282277
1595 4 1 16 266877
2138 4 1 15 346992
2669 3 2 47 405283
966 2 2 44 143916
1738 1 1 3 278097
830 2 1 37 113612
1982 4 1 7 342283

We have 4 features/descriptors: house size ( $x_{1}$ ), number of bedrooms ( $x_{2}$ ), number of bathrooms ( $x_{3}$ ), and age of house ( $x_{4}$ ). The house price ( $y$ ) is the output/the variable of interest/the dependent variable, and is what we want to predict based on the 4 features.

The idea in regression is that we want to approximate $y$ . We write $\overset{y}{^}$ to indicate an approximation of $y$ . In this case, we want to find some approximation $\overset{y}{^} = θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + θ_{3} x_{3} + θ_{4} x_{4}$ . If we can find the $θ$ parameters, then we would have an estimation of the house price as a linear combination of the input features.

Let’s say we obtain the following linear regression equation for the estimated house price ( $\overset{y}{^}$ ):

$\overset{y}{^} = - 8775.58 + 147.12 \times House Size + 9660.37 \times Number of Bedrooms + 25691.99 \times Number of Bathrooms - 1285.01 \times Age of House$

By looking at the coefficients, we can understand the direction and magnitude of the effect from each of the features we have. For example, if we increase the age of the house (and keep everything else fixed), the house price will decrease.

Ideally, we would scale the house size, noting that its values are much larger than the other values. If we don’t, then we would end up with a much smaller coefficient (as can be seen in the equation). Since we want to interpret the importance of the 4 features based on the $θ$ coefficients, we should rescale all the features (using a method like normalisation) so they have a similar scale. This would allow for a fair comparison.

COMP9417

Explorer

Linear Regression

House Size (sq ft)	Num. of Bedrooms	Num. of Bathrooms	Age of House (years)	House Price ($)
1360	4	2	27	251240
1794	1	2	42	262602
1630	4	2	28	282277
1595	4	1	16	266877
2138	4	1	15	346992
2669	3	2	47	405283
966	2	2	44	143916
1738	1	1	3	278097
830	2	1	37	113612
1982	4	1	7	342283