Welcome to the IPI PAN Corpus website!
The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS), mainly within projects funded by the State Committee for Scientific Research, as well as within statutory research carried out at ICS PAS.
The texts making up the IPI PAN Corpus are available in the binary form, accessible via Poliqarp, a dedicated efficient corpus search engine and concordancer. Poliqarp is available both as an online search tool and as a stand-alone program for GNU/Linux and Windows. Since March 2006, the source codes of Poliqarp are available under the GNU Public Licence. Current versions of Poliqarp and corpora in Poliqarp's format are available from the Download page.
We would like to sincerely thank everybody who has helped us develop the IPI PAN Corpus and accompanying corpus tools.