*The following article is a guest post written by **Data Science Investor**. *

**Date of Analysis: 17 January 2020**

**Period of data: Jan 2017 to Jan 2020**

**Number of transactions analyzed: 3951**

(transaction data extracted from URA website)

District 18 is one of the districts within the OCR (Outside of Central Region) of Singapore. It comprises of few neighbourhoods such as Pasir Ris, Simei and Tampines. Some of the private properties in this region are Vue 8 Residences, D’Nest and The Santorini etc. Recent new properties in the area are Treasure at Tampines and The Tapestry.

How do the private properties in D18 generally fare? Using box plots, here are the details for each of the properties in D18.

To help you better understand the data, I will use **The Tapestry **as an example here. From the diagram, you can see that

Average price- $1374 psf

Median price- $1394 psf

Price at 25th percentile- $1296 psf

Price at 75th percentile- $1456 psf

Box plot is a good way to present the data. In this case, you can easily see the average price, median price and price at 25th percentile and price at 75th percentile from the plots. You could also tell at one glance how wide the spread of prices are for any of the condominium projects.

As compared to the other OCR districts which I have analyzed such as D19 and D23, the prices in D18 certainly seems much more affordable (similar to D22).

The most expensive condominium in D18 is **The
Tapestry** with an average price of $1374 psf while the most
affordable condominium in D18 is** Elias Green **with
an average price of $582 psf. **The Tapestry** is a
99 year leasehold property which is expected to TOP in 2022. It’s important to
note here that **The Tapestry** is not within
walking distance of any MRT and thus the location can be considered as less
than ideal.

How is the general trend of the prices in D18 then? Let’s take a look at the various scatter plots to have a better insight of how the property prices perform across 3951 transactions in the past 3 years.

Here is a scatter plot of the $psf against date.

The r coefficient in scatter plot is used to explain the strength of the linear relationship between 2 variables. In this case, the r coefficient help us to better understand how the $psf changes with time. Thus, if the r coefficient is high, we could roughly assume that the $psf increases positively with time. The r coefficient (or much simply/loosely put, the gradient for the line of best fit) in the scatter plot above is 0.44. This means that the $psf in D20 is increasing well over the past 3 years.

Also, you could also better understand if you are “over-paying” for your property purchase from the line of best fit (eg. if you property is above the line of best fit). From the plot, it can be observed that your transaction will be on the high side if you are paying anywhere near $1200 psf in Feb 2019. Of course, there could be many factors such as location, tenure etc that could influence your buying price. This is still a general assumption.

Next, which projects perform remarkably well comparatively in the past 3 years?

The plot above shows a myriad of lines of best fit from various different projects in D18.

Two of the top performing projects in D18, as seen from the
graph above, are **Coco Palms** and **The Alps Residences**. **Coco Palms** is a 99 year
leasehold project which has recently TOP. It is a large development with 944
units, and is within 8 minutes of walking distance from Pasir Ris MRT. **The Alps Residences** is also a 99 year leasehold
project, but has yet to TOP and is expected to complete this year. Its location
is less ideal though, with the nearest station (Tampines MRT) a 10 minutes walk
away.

How do freehold properties perform against leasehold properties in D18 during these 3 years?

There are typically not too many freehold properties in D18, hence the number of freehold transactions in the district aren’t too many. As compared to the r coefficient of the trendline for all transactions in D18, the r coefficient of the trendline for freehold transactions in D18 pales in comparison (0.18 against 0.44). Hence, freehold properties might not be worth the premium for this district.

How about apartments of various sizes? How do they perform against each other?

Apartments with size less than 500 sqft performs significantly worse as compared to the apartments of other sizes as seen from the graphs above. It has the smallest r coefficient of 0.08, which almost signify a stagnation in growth of $psf. This trend is similar to some of the other OCR districts which I have analyzed such as D22 and D23. Hence, caution might be needed when buying a 1 bedder/studio apartment in districts which are far away from the central region.

Based on the large data collected from the various transactions which happens in D18, I attempt to build machine learning models with various tool such as random forest and linear regression. They are methods which we could generally use to apply regression techniques to attempt to construct a linear relationship between price and various other variables (in this case, it will be project name, date of sales, size of flat etc). What we

ultimately try to construct is a predictive model which
allows us to have the highest confidence in prediction by attempting to
reducing as much prediction errors as possible (think about **Mean Absolute Error** and **Root Mean Squared Error**)-
More of this will be discussed in a separate article on Data Science Investor in the future.

Running through all 3951 transactions through several machine learning models, I eventually achieve a model which provides me with suitable evaluation results (MAE of 90858, RMSE of 1901759 and R2 of 0.985).

I then now try to put this machine learning model to practice and use it to determine what should be a reasonable price for the following property.

Project: Elias Green

Area: 1528 sqft

Floor level: High Floor (assume to be 11 to 15)

Running through the machine learning model which I have created, the price which I have obtained is $1,021,468 which is quite similar to the asking price. This might then suggest that the asking price is fair and that can be taken into consideration during price negotiation. Though, please take note that more investigation will also be needed to look at other factors beyond these parameters.

Now, with these data in mind, go be a data science investor!

Refer to Data Science Investor for more of such articles 🙂