User Tools

Site Tools


resources:tools

Software and Tools


Corpus Tools

You can find tutorials for WordSmith Tools, AntConc and other corpus interfaces (CQPweb, BNCweb, BYU) in our knowledge base!

Registration is free and your account also works for all of our other web services!

WordSmith Tools

WordSmith Tools is an integrated suite of programs for looking at how words behave in texts. You will be able to use the tools to find out how words are used in your own texts, or those of others. The WordList tool lets you see a list of all the words or word-clusters in a text, set out in alphabetical or frequency order. The concordancer, Concord, gives you a chance to see any word or phrase in context – so that you can see what sort of company it keeps. With KeyWords you can find the key words in a text. The tools have been used by Oxford University Press for their own lexicographic work in preparing dictionaries, by language teachers and students, and by researchers investigating language patterns in lots of different languages in many countries world-wide. There are several extras available, such as a BNC word list and a Shakespeare corpus.

http://www.lexically.net/wordsmith/index.html

Free alternatives:

AntConc

A freeware concordance program for Windows, Macintosh OS X, and Linux.

You can find a good and detailed description of its functions here (German): http://litre.uni-goettingen.de/index.php/AntConc#Keyword_List_Tool

http://www.antlab.sci.waseda.ac.jp/antconc_index.html

CasualConc

A freeware concordance program for MacOS X, which runs natively on Macs. Might be worth a try for Apple users.

https://sites.google.com/site/casualconc/

ConcApp

ConcApp provides concordance searches, and includes full editing support and testing activities, and also word frequency text analysis. ConcApp also has support for unicode and can process not only English, French and probably most other European languages, but Chinese, Japanese, Thai and Russian texts in unicode.

http://www.edict.com.hk/pub/concapp/

QUITA: Quantitative Text Analyzer

QUITA is a freeware tool for easy calculation of the some basic quantitative indices of a corpus (e.g. Type-Token-Ratio, distance between verbs, etc.) and some more advanced index numbers (h-point, entropy, …). It supports automatic tokenization (whitespace or nltk), lemmatization (nltk), POS-tagging (nltk treebank tagger) and can output N-grams.

https://code.google.com/p/oltk/

Please note that QUITA requires Python 2.X + NLTK + numpy installed. For a detailed instruction see How to install QUITA (to access the knowledgebase, a free registration is required)


Web as Corpus

WebCorp Live

Searching the web for concordances in real time.

http://www.webcorp.org.uk/index.html

KWiCFinder

A tools for building ad hoc corpora from the web based on search engine queries. KWiCFinder conducts your online searches without supervision. It returns Key Word in Context abstracts highlighting your search terms so you can evaluate the usefulness of documents matching your query.

http://kwicfinder.com/

BootCaT

Simple Utilities to Bootstrap Corpora And Terms from the Web. The perl scripts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list of “seeds” (terms that are expected to be typical of the domain of interest) as input.

http://bootcat.sslmit.unibo.it/

HTTrack

HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

http://www.httrack.com/

HTMLTidy

When editing HTML it's easy to make mistakes. Wouldn't it be nice if there was a simple way to fix these mistakes automatically and tidy up sloppy editing into nicely layed out markup? Well now there is! Dave Raggett's HTML TIDY is a free utility for doing just that. It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.

http://www.w3.org/People/Raggett/tidy/

Web2Text

Web2Text is a HTML to ASCII text converter. Unlike most others, however, this one not only has an easy to use graphical interface but it actually produces a nicely laid out text version, and keeps URLs visible. A minimum of post-conversion editing required!

http://www.web2text.com-about.com/


Transcription Tools

Transcriber

Transcriber is also used here at the Chair of English Linguistics for the compilation of ICE Malta and ICE Puerto Rico. Please do contact us, if you need help! Also, there are some tutorials and links in our knowledge base!

Transcriber is a tool for assisting the manual annotation of speech signals. It provides a user-friendly graphical user interface for segmenting long duration speech recordings, transcribing them, and labeling speech turns, topic changes and acoustic conditions. It is more specifically designed for the annotation of broadcast news recordings, for creating corpora used in the development of automatic broadcast news transcription systems, but its features might be found useful in other areas of speech research.

http://trans.sourceforge.net/en/presentation.php

f4

f4 is a keyboard controlled audio and video player that supports your transcription process.

http://www.audiotranskription.de/english/f4.htm

ExpressScribe

Express Scribe is free professional audio player software for PC, Mac or Linux designed to assist the transcription of audio recordings.

http://www.nch.com.au/scribe/index.html

ELAN

ELAN is a professional tool for the creation of complex annotations on video and audio resources.

https://tla.mpi.nl/tools/tla-tools/elan/

Praat

Doing phonetics by computer.

http://www.fon.hum.uva.nl/praat/


IPA Tools and Fonts

Doulos SIL and Charis SIL

IPA Character Picker

PhoTransEdit

PhoTransEdit applications have been designed to help those who work with English phonetic transcriptions. Far from providing perfect automatic transcriptions, PhoTransEdit is aimed at just helping you save your time when writing, publishing or sharing English transcriptions.

http://www.photransedit.com/


Other Tools

NotePad++

Notepad++ is a free (as in “free speech” and also as in “free beer”) source code editor and Notepad replacement that supports several languages. Running in the MS Windows environment, its use is governed by GPL License.

http://notepad-plus-plus.org/

PDF X-Change-Viewer

Free PDF viewer that allows simple editing (notes, …) of PDF files.

http://www.tracker-software.com/product/pdf-xchange-viewer

PDFCreator

Create PDF files from all Windows applications via the print function.

http://www.pdfforge.org/pdfcreator

WebLicht

WebLicht consists of a collection of web-based linguistic annotation tools, distributed repositories for storing and retrieving information about the tools, and this web application, which allows you to easily create and execute tool chains without downloading or installing any software on your local computer (https://weblicht.sfs.uni-tuebingen.de/WebLicht-4/).

https://weblicht.sfs.uni-tuebingen.de/WebLicht-4/

resources/tools.txt · Last modified: 2015/04/29 11:21 by vetterf