Text processing using Debian

Valic —  May 13, 2013 — Leave a comment

Debian is an operating system which includes software packages released as open source software for free, mainly under the GNU General Public License but also with other free software licenses. It runs a popular Linux distribution having access to repositories including software packages which are ready for use. In this article you will learn how to process texts using Debian. We assume that you already know how to create simple text documents using an editor, such as v1, for example. Let’s go, then.

What does text processing means?

It allows you to write the content of a file as an ASCII text using some additional commands in order to describe better your document structure. By starting the text processor you will be able to convert the source text to a file with a layout. This file can include some tables, formulas or figures, for example. At this moment some of the most used and known text processors for Debian are LaTeX and TeX.

LaTeX is a strong macro package used for TeX typesetting system. There are some text processing tools used by most users which make this by piping text through chains. When no regular expression is used you can concatenate files and output the resulted content using the cat(1) command or, in the other direction you can use tac(1) to output a reversed content. Use cut(1) to select parts of lines, sort(1) in order to sort lines from content or tr(1) if you want to delete characters or translate content.

On the other hand you are able, using some basic regular expressions (BRE), to match some text with some patterns – using grep(1) – or to power a screen editor using the command vim(1). Other extended regular expressions (ERE) will make possible for you to do simple text processing by using egrep(1).

If you are not familiar with these commands you can figure this out very easily using the command man command’. Below you can find a list with useful commands for standard text processing on Debian:

  • When no regular expression is used: cat(1), tac(1), diff(1), tr(1), cut(1), head(1), uniq(1), tail(1), sort(1)
  • When basic regular expressions are used: grep(1), emacs(1), ed(1), vim(1), sed(1)
  • When extended regular expressions are used: egrep(1), python(1), awk(1), pcregrep(1), tcl(3tcl), perl(1)

You can use regular expressions for many tools for text processing. Think about them as shell globs but obviously more complex and powerful. These expressions are formed by meta characters and text characters and they describe usually matching patterns. The meta characters are just simple characters having special meanings. Replacement expressions are some characters having also special meanings.

This was only an introduction article in this area, this overview being only the beginning for what is about to come. We hope that these tips regarding text processing in Debian operating system will help you make an idea about how this works and future documentation will be always available on debian.org in order to help you move on with your needs.

Valic

Posts Twitter Facebook

Editor in Chief at Debian-Tutorials, Linux enthusiast.

No Comments

Be the first to start the conversation.

Leave a Reply