top of page
Real Estate Pricing

Real Estate Pricing

The methodology

  • Ads listing data is analyzed taken from Hurriyet Emlak for Ankara between Deember 2018 and July 2019

  • Understand the pricing for the real estate using other features given in the dataset

  • Data preprocessing methods are conducted to have better results in further modeling studies

  • Different models are applied

    • K-means clustering

    • Multiclass classification

    • Multiple linear regression, and

    • Hypothesis testing to be able to answer various questions related to pricing

  • The models are evaluated with appropriate metrics and the results are interpreted accordingly

Insights

  • High correlation between price and m2 which makes sense in real life

  • Negative correlation between latitude and price. This can be explained that the northern part of Ankara is less costly compared to the southern part of the city

  • It cannot be said that the real estate prices have changed during the 7 months in Ankara

  • There are 5 clusters obtained. The four of the clusters represent the ads in the center of the city grouped as northwest, northeast, southern west and southern east. The fifth cluster represents the group of ads in the southern west that are far from the city center. It is observed that southern west (cluster 5) has the highest price mean while the cluster 2, having the far points from the center, has the lowest mean. The results seem reasonable since the real-estate prices decrease as the distance increases from the city center. Also, the southern west region of Ankara includes new buildings and areas that leads to higher prices.

  • price/m2 means also differ with respect to regions

  • Coordinates and the size of a “Daire” are the most influential features for the price of a “Daire” and it makes sense.

Movie Recommender System

The methodology

Insights

  • The MovieLens dataset is used for a movie recommender system and the analysis of the movie ratings

  • Which movie should be recommended to users

    • Users that have no movie watching record before

    • Users who have a movie watching tracking

  • Which features influence the movie ratings

  • Data preprocessing methods are conducted with the purpose of having better results in the analysis

  • Different models used for deep dive to gain insights

    • Association rule mining,

    • Light gradient boosting machine,

    • Multiple linear regression

  • The models are evaluated with appropriate metrics and the results are interpreted accordingly

  • Drama and comedy are the mostly watched genres

  • 4 types of movie recommendation are made to the users who has no prior experience in movie watching. These are mostly watched movies, top rated movies, top rated movies of each genre and top-rated movies of each year after 2020.

  • The Pulp Fiction, Forrest Gump and Shawshank Redemption are the in the top 3 spot among mostly watched movies.

Ekran Alıntısı.PNG
Ekran Alıntısı.PNG
  • For the users who has a wathcing record, basket analysis made by considering the movies with the rating of 5 as the users have liked them. Other movies rated with a value of less than 5 are considered as the people haven’t liked them. And all the movies that person has watched and rated with 5 would be his/her all transactions. So, it would be possible to extract a pattern which movies would be watched by the same person. Moreoveri movies rated less than 12000 times and users rated less than 850 movies are filtered out. Then a movie with rating data is converted into a transaction matrix.

    • An example of the recommendation is the following : Star Wars Episode 5 and Indiana Jones Raiders of the Lost Ark can be recommended to the ones who watch Toy Story, Pulp Fiction, Matrix, Silence of the Lambs, Braveheart, and Start Wars Episode VI.

  • By using multiple linear regression model, rating is analyzed as a target value and it is investigated that which paratemeters have an effect on the rating and following equation is obtained:

    • Rating =
      -8.4E-05 - 0.1988 * Action + 0.1025 * Adventure +
      0.2762 * Animation - 0.3402 * Children - 0.1444 * Comedy + 0.2391 * Crime + 0.3220 * Documentary + 0.1555 * Drama + 0.0476 * Fantasy + 0.2675 * Film-Noir - 0.1704 * Horror +
      0.0222 * IMAX + 0.1224 * Musical + 0.1359 * Mystery –
      0.0281 * Romance + 0.0343 * Sci-Fi -0.0465 * Thriller +
      0.3375 * War + 0.0150 * Movie ID + 0.0019 * Movie Year

    • If a movie is an action or a children movie, rating drops whereas for the documentary, film-noir and war movies being that genre of movie has a positive impact on the rating

Movie Recommender System
Real Estate Pricing
bottom of page