Text Analysis Info - Content: quantitative without category system

Last update: 12. February 2008

The following progams do not work with category system. They mostly analyse the co-occurences of the words in the text, some perform multivariate statistical analyses like factor analysis, cluster analysis, or multi-dimensional scaling (MDS). Other use neural networks, but some companies just don't mention what technique their software uses.


Alceste 4.7
author: Max Reinert
distributor:
Image, Toulouse, France
program: Alceste
documentation: in English, French, and Italian
download: none
operating systems: MS-Windows, MacOS, Unix
description: can do word lists, extracts clases of terms


CatPac II
program:
Catpac II
author: Joseph Woelfel
distributor: Galileo Company
operating system: MS-Windows
documentation: manual
download: no
description: none yet


Hamlet
program:
Hamlet
author: Alan Brier

download: free for personal use
operating system: DOS, Win3.1, MS-Windows
documentation: manual in (MS-Word), and tutorial (self extracting file)
description: The main idea of HAMLET (c) is to search a text file for words in a given vocabulary list, and to count joint frequencies within any specified context unit, or as collocations within a given span of words.
Individual word frequencies (fi), joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies
sij = (fij) / (fi + fj - fij)
are displayed in a similarities matrix, which can be submitted to a simple cluster analysis and multi-dimensional scaling.
A further option allows comparison of the results of applying multi-dimensional scaling to matrices of joint frequencies derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS).
Further procedures are included to help to determine the broad characteristics of word usage in a text:


Intelligent Miner for Text - Text Analysis Tools 2.3
program:
IBM
author and distributor: IBM
download:trial version 60 days
operating system: OS/390
documentation: fact sheet
description:
The text analysis tools can be used to analyse all types of online documentation, from customer requests and technical reports to newspaper and magazine articles.


Semio Taxanomy 2.0
program:
Semio 2.0
author: Claude Vogel
distributor: Entrieva
documentation: none
download: live demos
operating system(s): MS-Windows, Solaris 2.5
description: Semio Taxonomy combines unique linguistic analysis technology and statistical clustering with user-defined vocabulary requirements to create an intuitively browsable structure of categories that provides intelligent access to the global information space within a mass of formerly unstructured text.
Important phrases and keywords are extracted from a variety of text sources such as intranet/Internet sites, Lotus Notes, Documentum, ODBC-compliant databases, XML, etc. This process combines language detection, proximity analysis and stemming and normalization rules to produce the cleanest, most informative extraction technology available.
These extracted concepts are then clustered using information theory techniques developed as the result of work over the past twenty years. Once this process has selected the truly relevant information from the original unstructured text, any number of top-level classification structures can be applied to it. These structures extract lexical derivatives from the network of clusters and place them into categories. The result: a browseable category structure that actually provides insights to the user about the search space without resorting to the 'hunt-and-peck' method of keyword searches. Since the only requirement of a classification structure is that it reflects information that can be found within the source text, the configuration and customisation of the structure is virtually unlimited.
The client can configure their taxonomies to reflect a corporate thesaurus or controlled vocabulary. Semio Taxonomy is fully compliant with ISO thesauri, and can be tailored to any client terminology initiative. The power of applying multiple classification structures to the same source text becomes clear when users see for the first time the actual textual evidence that led to those structures in the first place.
Process Steps:
Semio Taxonomy performs a three-step process to classify text contents.

Semio needs 96 MB RAM and a minimum of 500 MB free disk space.


SPAD-T
program:
SPAD-T
author and distributor: Decisia
documentation:none
download: no
operating systems: not specified
description: SPAD-T analyses texts of automatically by associating numerically coded information. Comparisions of texts are done with probabilistic type and methods. Categorisation can also take external variables (e.g. age, sex, profession) into account using SPAD-N.

SPAD-T counts words and word sequences (phrases) using sort order tables and exclusion criteria like length or frequency. Using probabilistic methods characteristic words, word sequences, or sentences are found. Also KWICs with a fixed line length of 132 characters are possible.

Comparisions of the vocabularies of texts are performed with diffenrent types of factorial analyses and correspondence analyses. Also external variables can be included.

Contingency tables of common words or the segments repeated within the texts are also possible. Cluster analyses (hierarchical using reciprocal neighbors) using Ward's method allow e.g. an automatic classification of responses to open ended questions.


TextAnalyst 2.3
program:
TextAnalyst 2.3 or German version
author: Sergej Ananyan
distributor: Megaputer
download: evaluation
operating system: MS-Windows
documentation: tutorial and a white papers
description: TextAnalyst is a unique intelligent text processing tool capable of automated semantic analysis, summarisation, and navigation of unstructured natural language texts. In addition, TextAnalyst can help you perform clustering of documents in your textbase, semantic information retrieval, and focus your text exploration around a certain subject.



T-Lab 5.5 pro

program: T-Lab 5.5 pro
author: Franco Lancia
distributor: T-lab
documentation: in English, Italian, French, and Spanish. Also a quick introduction is available in these languages. The tutorial is only available in English.
download:
Test version (multilingual)
operating system: MS-Windows
description: T-LAB software is an all-in-one set of linguistic and statistical tools for text analysis which can be used in the following research fields: semantic analysis, content analysis, perceptual mapping, text mining, and discourse analysis.
Available versions are in English, French, Italian and Spanish, each with a dictionary and a knowledge base. Moreover, without automatic lemmatisation, T-LAB allows us to analyse texts in all languages supporting ASCII format.
T-LAB has three sub-menus: analyses and maps, lexicon, consultation. All ANALYSES AND MAPS can be done with two kinds of settings: automatic or customised.
There is a limit on the file size of 10 MB, for most analyses this will not be exceeded.


Please send comments and suggestions to