Welcome to the website of the KRYS I Corpus.

The KRYS I corpus is a collection of over 6300 documents labelled with their genre classes. It was constructed as part of a research initiative to automate document genre classification driven by the Digital Curation Centre. It was carried out at the Humanities Advanced Technology and Information Institute (HATII), University of Glasgow between 2005 and 2008.

The notion of genre is deeply embedded in the way humans organise information. Identifying the genre of a document helps to characterise the physical and conceptual structure of the text, helping to capture the style and location of further information within the text. There have been very few genre-labelled corpora available to the research community. Our corpus is made available here to fill this gap and serve as a valuable resource for researchers in:

To access the Corpus, please register first by going to the page Registration/Login. By registering to access the Corpus, you are agreeing to the specified Terms of Use. The KRYS I Corpus is owned by the University of Glasgow. However, the copyrights to the documents within the Corpus are retained by the original copyright owners, and their permission might be required before you copy, use, or distribute any of the content. Please note that documents will be removed upon the request of the copyright owners without prior notice. Also, note that access to the corpus could be withdrawn should any misuse of the Corpus be detected.

All documents within the KRYS I Corpus have been collected from the Internet and are, thus, publicly available. While the authors tried to assure that no copyright law was violated, not all document owners could be contacted. Should you find that your document is in use unrighteously within this collection, please contact the KRYS I Corpus Manager at , quoting the document ID number within the collection, and the document will be removed immediately. Please be assured, that no content within the document has been altered except when this has been explicitly asked for by the copyright owner.

There is more information about the construction method and composition of the corpus on the Information page. Automated experiments we have conducted using the corpus are reported in the published papers listed on this page.

If you would like to further help this corpus building initiative, please go to our Genre Classification System (PICS), register, and classify and submit your own documents.

If you have any queries please contact

©Copyright HATII, University of Glasgow. Last updated by Yunhyong Kim November 2008.




Terms of Use