Text Analysis Info - Information retrieval software

Last update: 18. August 2008

Programs listed here can be divided into more subtle groups:


Analysis 2.94
program:
Analysis 2.94
author: Giovanni Lo Conti
distributor: Giovanni Lo Conti (gloconti@romascuola.net)
documentation: none
download: free version
operating system: MS-Windows, Digital Unix, Acorn RiscOS
description: Analysis is a program which allows several types of analysis about the text: concordances, KWIC, KWOC, indexes of readability, co-occurrences, lemmatization, statistics about the sentences, non intelligent abstract; Summary; meaningful and sense; Incipit; explicit; frequency; for many procedures it is possible to delimite the range or compare the text with an electronic dictionary; it is provided whith Help, Help on line, and Wimp.


AntConc 3.2.1
program:
AntConc 3.2.1.
author: Lawrence Anthony
distributor: Linguist's Software
documentation: readme file for usage
download: free version
operating system: MS-Windows, MacOS, Linux
description: This is a free concordance program.


AnyText
program:
AnyText
author: Linguist's Software
distributor: Linguist's Software
documentation: no
download: no
operating system: MacOS System 7.1-9.2, or the Classic system in OS X. (You must be able to boot into Classic to install.) 2 MB of RAM.
description: AnyText is a HyperCard®-based Full Proximity Boolean Search Engine and Index Generator that allows you to create concordances and do FAST word searches on ordinary text files in English, Greek and Russian languages. AnyText was designed especially to work with the Greek, English, Cyrillic and Latin Bible texts, but can be used with any text-only file. The text files can be on diskettes, hard disk drives or CD-ROM drives, as long as there is disk space for the special indexing files that AnyText must create and access for operation. Requires 2 Megabytes of RAM.


Ask Sam 7.0
program:
Ask Sam 7.0
author: Ask Sam Software Development
distributor: Ask Sam Software Development
documentation: overview and quick tour
download: trial version
operating system: MS-Windows
description: AskSam is a fast information retrieval program and allows searching in E-mails and PDF-files. The new professional version allows programming (e.g. with Visual Basic).


ATA - Ashton Text Analyser (WinATA Mark 2)
program:
ATA - Ashton Text Analyser
author and distributor: Peter Roe

documentation: users's guide
download: no, but it is free for non-commercial applications
operating system(s): Win9x, WinNT
description: ATA generates word lists, KWIC, KWOC


Collocate
program:
Collocate
author: Michael Barlow
distributor: Athelstan
documentation: is in the test version file
download: demo The demo processes data in the same manner as the full version, but the results are limited to the top 5 items.
operating system(s): Win9x
description: Collocate is a new software program that can be used to find collocations or terms in a corpus. There are three main components:

  • Search for a word (phrase) within a set span (e.g. 4 words). The program lists all the collocations containing the searchword and provides frequency and/or statistical information (Log Likelihood, Mutual Information).
  • Produce an n-gram list for the corpus.
  • Extract collocations from the corpus as a whole.

  • Concordance 3.2
    program:
    Concordance 3.2
    author and distributor: Rob J C Watt

    documentation: manual
    download: trial version
    operating system(s): Win9x, WinNT, WinXP
    description:
    phrases, proximity search, samples, regular expression search, references
    book-like indexing, treat upper and lower case separately, show duplicate words separately, analyse characters instead of words, It can also handle East Asian languages (e.g. Chinese).
    sort headwords by order of occurrence, sort word endings using a string sort, sort contexts by string before and string after headword
    language support including East Asian languages on Windows 2000/XP
    user-definable HTML entity translation


    Corpus Presenter 10.0

    program: Corpus Presenter 10.0
    author: Raymond Hickey
    distributor: Raymond Hickey
    documentation: manual
    download: full and free version
    operating system(s): WinXP
    description: Corpus Presenter is a suite of programs designed to work with both existing corpora and any files which users might wish to examine for linguistically interesting structures. It has all the options of standard corpus software, i.e. it can generate concordances, word lists and perform a whole range of text retrieval tasks and generate reverse dictionaries of words in texts. It does not require that texts are prepared in any way, e.g. by indexing them in advance.


    Eric Johnson's programs
    Note: some pages are 10 years old, and I couldn't find current information HK


    KURA 2.2
    This program was removed, because the web pages were outdated and I couldn't find current information HK


    LEXA 7.0 - Corpus Processing Software
    program:
    LEXA 7.0
    author: Raymond Hickey University of Essen/Germany
    distributor: University of Bergen, Norway
    documentation: documentation quite like a manual
    download: test
    operating system(s): DOS
    description: LEXA is an open system based on files. It can perform lemmatisation, word lists, lexical density tables, file comparision, global find and replace, database and corpus management functions (print, sort), statistics on characters, words, and sentences, searching groups of files looking for strings, also with wildcards * and ?, also in databases (DBF-files). There are also lots of DOS-utilites.


    Metamorph
    program:
    Metamorph
    distributor: Thunderstone Software
    documentation: manual
    download: none
    operating systems: DOS, Win9x, WinNT, Unix
    description: Metamorph is a realtime concept based search package. It will search through anything without any pre-processing steps. Metamorph has an English language vocabulary of 250,000 word and phrase concept associations for natural language queries, also boolean logic (with weights), and wildcards can be used. It also provides proximity control, fuzzy searches, true regular expression matching, and numerical value searches.
    The Metamorph API alone is available for most operating systems.


    Microconcord
    program:
    MicroConcord
    author: Mike Scott, Tim Johns
    distributor: Mike Scott
    documentation: none
    download: freeware
    operating system(s): DOS
    description: MicroConcord is the predecessor of WordSmith. It is faster than Windows but the number of concordance lines is limited to around 1,500, and you can't save a concordance except as a text file.


    MicroOCP - Oxford Concordance Package
    The program is not available any more. However, you will find outdated information on the web that tells you otherwise, they lead you to dead links.


    MonoConc Pro 2.2
    program:
    MonoConc 2.2
    author: Michael Barlow
    distributor: Athelstan
    documentation: unknown
    download: demo limited to 20 hits
    operating system(s): Win9x
    description: MonoConc is a concordancer. It can create concordances, word lists, (with exclusion lists, case sensitive/insensitive), converts texts, and works with tagged texts and with different languages. Searching can be done with wildcard characters and variable (multi-line) context (also a sentence). Sorting to words left and right, collocation of words is possible, too.


    Phrase Context 1.02
    program:
    Phrase Context
    author/distributor: Hans J. Klarskov Mortensen
    download: test version
    documentation: none
    operating systems: Windows ?
    description: Phrase Context is a versatile program that counts words and phrases, does concordances, calculates TTR-and lexical density values, regular expressions as search patterns, and writes XML formatted output files. The author also provides some free utilities like extracting texts from PDF-files.


    SCP 4.0.9 - Simple Concordance Program
    program:
    SCP 4.0.9 - Simple Concordance Program
    author/distributor: Alan Reed
    download: free software
    documentation: none
    operating systems: WinXP, MacOSX
    description: This free program lets you create word lists and search natural language text files for words, phrases, and patterns. SCP is a concordance and word listing program that is able to read texts written in many languages. There are built-in alphabets for English, French, German, Greek, Russian, etc. SCP contains an alphabet editor which you can use to create alphabets for any other language.


    Sonar 2003.32 Text Retrieval/Document Management Systems
    program:
    Sonar 2003.32
    distributor: Virginiasystems
    download: demo
    documentation: none
    operating systems: Win9x, WinNT, MacOS
    description: High speed program than can process many types of text and word processing files.


    Textalyzer
    program:
    Textalyzer
    author: Bernhard Huber
    distributor: none
    documentation: self explaining
    download: none
    operating system: runs on a web site
    description: Textalyser is a free text analysis tool that counts words, sentences, syllables, and lexical density. It also computes the Gunning readability index. A small but nice tool that counts syllables correct at least for English, French, and German. You can cut and paste text or specify a web page.


    Textstat 2.7
    program:
    Textstat 2.7
    author: Matthias Hüning

    distributor: Matthias Hüning
    documentation: manual
    download: freeware
    operating system: Windows, MacOS, Linux
    description: TextSTAT is a simple programme for the analysis of texts. It reads ASCII/ANSI texts and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. The programme runs on MS Windows and is distributed as freeware. Source code in Python is also available for free. User interface in German (default), English, and French.


    WordSmith 5.0
    program:
    WordSmith 5.0
    author: Mike Scott
    distributor: Mike Scott, Liverpool University
    documentation: manual in English, French, and German
    download: test version shows a sample of the results only
    operating system: Win9x, WinNT
    description: WordSmith is the sucessor of MicroConcord.


    Please send comments and suggestions to