← Back to Index

General architecture

Voikko consists of a set of separately released components that form a stack of layers as illustrated in the picture below.

General architecture of Voikko

In this project we develop the components shown with blue background, and some of the components with yellow background:

Libvoikko is the high level library that contains among other things algorithms that generate spelling suggestions and perform rule based hyphenation. It is also capable of caching the results of common spell checking operations to improve performance. All of the grammar checking is also done within libvoikko. Libvoikko supports few different dictionary formats some of which require dependencies on other libraries such as hfst-ospell. The different dictionary formats have distinct version numbers so that it is possible to have multiple versions of libvoikko installed on the same computer without problems related to dictionary compatibility.
voikko-fi (previously Suomi-malaga)
Voikko-fi is a description of Finnish morphology based on finite state transducer technology (Foma and VFST). The package contains morpholgies that are suitable for both spell checking and text indexing.
Libreoffice-voikko is an LibreOffice extension that uses Voikko to provide Finnish spell checking, hyphenation and grammar checking.
Enchant Voikko plugin
Voikko provider plugin for multi-backend Enchant spell checker library is included in Enchant version 1.4 and later.
Tmispell-voikko is an ispell compatible spell checker that uses Voikko to provide Finnish spell checking and falls back to real ispell for other languages. Tmispell-voikko was originally written by Pauli Virtanen for the freely distributable but closed source spell checker Soikko. Tmispell-voikko also contains an Enchant provider plugin for Enchant version 1.3. Tmispell-voikko is deprecated and not actively developed anymore. Developers using ispell to add spell checking capability in their applications should consider switching to Enchant instead.
Joukahainen is our web application used to maintain the vocabulary. Joukahainen is designed to store and provide vocabulary data in an application independent format which should make it easier to use and experiment with the data outside the Voikko project.

The reasons for using finite state technology instead of Hunspell

This project started in late 2005 under the name Hunspell-fi, with an aim to create Finnish vocabulary and affix files for Hunspell. The Hunspell based implementation was developed roughly six months, and there were no serious problems but it was also evident that the work progressed rather slowly. In early 2006 Hannu Väisänen published Suomi-malaga, which contained a vocabulary that was (depending on how one defines "word") roughly ten times larger than the Hunspell-fi vocabulary at that time. Additionally the Hunspell-fi implementation did not support compound words and only a few derived word forms, which were both supported by Suomi-malaga.

Between 2012 and 2015 the morphology was re-written on top of finite state technology that has become the most widely used tool for computational linguists to model the morphology of natural languages. The implementation used in Voikko is not a purely finite state solution but very close to that. From the end user point of view it is very similar to the original Malaga morphology.

There are major limitations (or at least there have been, some may have been fixed in recent versions) in Hunspell that make it practically impossible to use as a replacement of for Voikko when processing Finnish language:

All of the problems above could definitely be solved within Hunspell, but migh require a lot of work. Compromising quality just to become compatible with Hunspell is not an option, because Finnish people have come to expect really good results from their spell checkers (we have had advanced compound word checking in commercial text editors for well over ten years).

The latest version of libvoikko is compatible with Hunspell licensing so it would be possible to merging code between these projects. We are open to consider this but it should be noted that dictionary formats of libvoikko are still evolving.