This section is currently under development. Recently a discussion took
place in the content mailing list, and some questions were posted which you
find underneath. I tried to answer them, if you do not agree with me or if you
want to contribute yourself, please don't hesitate. All contributions will be
considered and be fitted into this section.
- What kind of software do I need to do X, where X is one of:
- scraping stories from a website
You will have to download the pages of the website(s) you wish to analyse.
A website is organised in files, many of the are graphic files, images, or
video that you mostly do not want to analyse. Programs that download a whole
website or part(s) of it are called offline-reader or web spider, mostly you
can restrict the download to files with certain file extensions (e.g. html)
or file size.
The second step is to prepare the files (editing) for further processing with
text analysis software. TextGrab is
a program that downloads all text files of a website and prepares these
for seamless processing with TextQuest
- simple term-searching
A simple form of term-searching is offered by each editor or text processor.
An already loaded file is searched, and the hits are shown. Most editors can
look for character strings, these may be whole words or any part of it.
- keyword searching
Keyword searching is very similar to term-searching, however the results
maybe different. Often concordance programs allow keyword searching, and
the results are displayed in a results window, not in the original file.
Sometimes also the results are specially formated, very popular is KWIC
(key-word-in-context).
- clustering analyses
- converting files between formats (the 'Word documents' question)
There are a lot of file formats for text files, the most of them are
proprietary (special for one product). Content analysis software often requires
the text as a plain text file (often called ASCII file which is incorrect
in a Windows environment).
- converting images/hardcopy to electronic text (OCR)
Content analysis software requires the text to be stored in a file. That
means, if you only have printed material, just must digitise it (or in
other words: make it readable for a computer). You can type the text, you
can dictate the text using dictation software (the most known are ViaVoice
from IBM and Dragopn's Naturally Speaking), and you can scan the text using
a scanner. The scanner stores the text as in image, and the next step is
transforming the image to text data. The software for this is called OCR
(optical character recognition) software. One might think that 99 % correctly
recognised characters is a good value, but this means that there are
between 10 and 20 errors per page.
- some particular CA-related task (like keywords in context, collocation,
etc.)
- what kind(s) of statistical tests can I use on CA data?
- what statistical packages work with CA data?
- are there any good books on CA?
- how do I CA X, where X is one of:
- web pages
- media or speech transcripts
- focus group or conversational or interview transcripts
- historical documents
- images
Please send comments and suggestions to