Debian is an operating system which includes software packages released as open source software for free, mainly under the GNU General Public License but also with other free software licenses. It runs a popular Linux distribution having access to repositories including software packages which are ready for use. In this article you will learn how to process texts using Debian. We assume that you already know how to create simple text documents using an editor, such as v1, for example. Let’s go, then.

What does text processing means?

It allows you to write the content of a file as an ASCII text using some additional commands in order to describe better your document structure. By starting the text processor you will be able to convert the source text to a file with a layout. This file can include some tables, formulas or figures, for example. At this moment some of the most used and known text processors for Debian are LaTeX and TeX.

LaTeX is a strong macro package used for TeX typesetting system. There are some text processing tools used by most users which make this by piping text through chains. When no regular expression is used you can concatenate files and output the resulted content using the cat(1) command or, in the other direction you can use tac(1) to output a reversed content. Use cut(1) to select parts of lines, sort(1) in order to sort lines from content or tr(1) if you want to delete characters or translate content.

On the other hand you are able, using some basic regular expressions (BRE), to match some text with some patterns – using grep(1) – or to power a screen editor using the command vim(1). Other extended regular expressions (ERE) will make possible for you to do simple text processing by using egrep(1).

If you are not familiar with these commands you can figure this out very easily using the command man command’. Below you can find a list with useful commands for standard text processing on Debian:

  • When no regular expression is used: cat(1), tac(1), diff(1), tr(1), cut(1), head(1), uniq(1), tail(1), sort(1)
  • When basic regular expressions are used: grep(1), emacs(1), ed(1), vim(1), sed(1)
  • When extended regular expressions are used: egrep(1), python(1), awk(1), pcregrep(1), tcl(3tcl), perl(1)

You can use regular expressions for many tools for text processing. Think about them as shell globs but obviously more complex and powerful. These expressions are formed by meta characters and text characters and they describe usually matching patterns. The meta characters are just simple characters having special meanings. Replacement expressions are some characters having also special meanings.

This was only an introduction article in this area, this overview being only the beginning for what is about to come. We hope that these tips regarding text processing in Debian operating system will help you make an idea about how this works and future documentation will be always available on in order to help you move on with your needs. Continue Reading…

SED useful commands

Valic —  February 5, 2013 — Leave a comment

1. How to remove lines ending with ‘baddump’ from a text file:

You’ve got a file with a bunch of lines end with “baddump”  and you need those lines to be removed completely without leaving any blank lines behind? This is the command to do it:

sed -i '/baddump$/d' file

2. You have text files with tons of empty lines and you want to get rid of those in one second?

sed '/^$/d' file > new_file

You may have multiple html files to correct at the same time.You can do that with foreach command:

foreach file (*html)
sed '/^$/d' $file > new_files Continue Reading...

Some SED Commands

Valic —  November 30, 2010 — Leave a comment

Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed’s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

Here are some SED commands used by everyone:

1) Print all the lines between 10 and 20 of a file

sed -n ’10,20p’ <filename>

Similarly, if you want to print from 10 to the end of line you can use: sed -n ’10,$p’ filename

This is especially useful if you are dealing with a large file. Sometimes you just want to extract a sample without opening the entire file.

2) Check your unread Gmail from the command line

curl -u username –silent “” | perl -ne ‘print “\t” if /<name>/; print “$2\n” if /<(title|name)>(.*)<\/\1>/;’

Checks the Gmail ATOM feed for your account, parses it and outputs a list of unread messages.

3) To print a specific line from a file

sed -n 5p <file>

You can get one specific line during any procedure. Very interesting to be used when you know what line you want.

4) Remove a line in a text file. Useful to fix “ssh host key change” warnings

sed -i 8d ~/.ssh/known_hosts

5) Recursive search and replace old with new string, inside files

grep -rl oldstring . |xargs sed -i -e ‘s/oldstring/newstring/’

recursively traverse the directory structure from . down, look for string “oldstring” in all files, and replace it with “newstring”, wherever found


grep -rl oldstring . |xargs perl -pi~ -e 's/oldstring/newstring'
Page 1 of 11