Exploration (placeholder)

SIR model

There has been a lot interest on simulation the devolpment of infectious deseases since the COVID-19 pandemic of 2019/20. Many different models and approaches exist to computationally simulate the spread of infectious deseases.

Evaluation Similarity Search

Using published results for the performance of state-of-the-art AI document-vector embeddings and semantic hashing I evaluated a canonical document-vector retrieval system boosted by approximate nearest neighbour search.

Teachers, Leave them Kids alone

"Erkenntnis macht frei, Bildung fesselt, Halbbildung stürzt in Sklaverei." - Wilhelm Raabe

The cosine measure is the prevailing similarity function for the document vector model of IR. We discuss a its connection to the intrinsic dimension.

URLs of VG-Media Publishers

Some unix shell commands to get a plain text list of web sites from publishers which are represented by VG Medien on behalf of the Leistungsschutzrecht für Presseverleger, see Wikipedia entry Ancillary copyright for press publishers.

Google Search Server Statistics

How many servers does Google need for it's web search? How many pages are crawled and indexed? Starting from Google's 2009 statement that it uses 1 kJ energy per search we estimate that Google used $\approx$ 130.000 servers for its search in 2008. We also speculate that Google only indexes 5% of its crawled pages.

Hierarchical Clustering in IR

Hierarchical agglomerative clustering (HAC) is a family of different algorithms to perform grouping of data. HAC starts by merging the two data points with smallest distance into a new cluster and finishes with one big cluster describing the data.

Size of German Bing Index

How many pages does Microsoft's search engine Bing.com hold in its index? Following the idea of Maurice de Kunder we can roughly estimate the size of Bing's index being 300 million pages.

Cluster Modelling for Insurance Portfolios

Recently I came across an article by Milliman (an actuarial and consulting firm) called Cluster Modelling: A practical and robust approach for achieving high improvements in model run-times for SST and Solvency II. Here I will analyse their approach and point out a disadvantage in connection with reasonable large protfolios.

Visualizing k-means++

Initialization of k-means can have a big impact on the performance of the k-means clustering algorithm. Straight forward random initialization can lead to many more iterations compared to a better initialization using kmeans++.

GMDB Binomial Tree Pricing

Here I show how to price a simple GMDB (unit linked insurance product) using installment options and calculate the premium using a binominal tree approach. The analysis shows that non-rational policy holder behaviour leads to strong mispricing.

Option Pricing Using Binomial Trees 3

In the previous article we showed how to price an option using the risk-neutral valuation principle. Following this principle the arbitrage-free option price is the expected payoff discounted by the risk-free interest rate. What we were missing until now is how to calculate the risk-neutral propability $p$.

Option Pricing Using Binomial Trees 2

In this part we will continue our one time-step two states binomial tree model for pricing an option (previous post) which will leed us to the principle of risk-neutral valuation.

Option Pricing Using Binomial Trees 1

The binomial options pricing model provides a numerical method for the valuation of options. Here I will give a short introduction to option pricing using the binomial tree model proposed by Cox, Ross and Rubinstein in 1979 (1).