Posts Tagged ‘text’

The sed utility is most often used to search and replace patterns in text. It works like a text filter or a stream editor.
Syntax:

sed [options] ´command’ Input-File

Options:
-n No print. The default is to print all lines plus lines selected with the p
command.
-e The next command is an edit command; used for multiple edits.
-f sed commands are in a file.

FINDING TEXT USING SED
Using line numbers: singly or in a range.

Using Regular Expressions
Examples:
x Where x is a line number
x,y In a range of lines, from x to y
/pattern/ Where pattern is a regex
/pattern/pattern/ Choice of patterns
/pattern/,x Look for the pattern on this line
x,/pattern/ Look only at line x for the pattern
x,y! Not lines x to y

Basic sed editing commands

p – Print the matched lines
= – Display the line number of the file
a\- Append the text after the addressed line
i\-Insert new text after the addressed line
d – Delete addressed lines
c – Replace addressed text with new text
s – Substitute pattern with replacement pattern
r – Read text from another file
w – Write text to file
q – Quit after first pattern has been matched, or just quit
l – Show control characters in their octal ASCII equivalent
()-Group a series of commands to be performed only on addressed lines
n – Read the next line of text from another file and append it
g – Paste the contents of pattern2 into pattern1
y – Translate characters
n – Append next input line; this allows pattern matching across two lines

Examples:

Substitute teh to the globally: sed 's/teh/the/g' sample.txt
Substitute teh to the on all lines: sed 's/teh/the/' sample.txt
Print line 3: sed -n '3p' foo.txt
Print lines 5 to 8: sed -n '5,8p' foo.txt
Print lines with the word four: sed -n '/four/p' foo.txt
Search  for four in line 4: sed -n '4,/four/p' foo.txt
Print lines containing $100:  sed -n /\$100/p' foo.txt
Print 10th line to the last: sed -n '10,$p'
Print lines with ing: sed -n '/ing/p'
Print the line number of a match: sed -n '/funny/=' foo.txt
Print the line and line number: sed -n -e '/fun/p' -e '/fun/=' foo.txt
Delete all commented lines: sed ‘/^#/ d ’ sample

More on sed!!!
sed can be used to append text. Let’s say we have a text on foo.txt file and a script on my.script file.

cat foo.txt
1 one
2 two
3 three
4 four
cat  my.script
/four/p
/four/a\
5 five

Using my.script on foo.txt appends 5 five.

sed -n -f my.script foo.txt
Output:
1 one
2 two
3 three
4 four
5 five

Writing a sed script
#!/bin/sed -f
/two/ a\
2.1
This script appends 2.1 after the word two on a file. After creating the script, make the script executable: chmod +x or chmod u+x insert.sh.
Running the sed script:

./insert foo.txt
cat foo.txt
1 one
2 two
2.1
3 three
4 four
5 five

Take note of the dot-slash(./) symbol. ./ means in this directory, run the executable.

a\ is to append a string on a file file after a defined pattern. Using a\ on sed script works the same with i\. i\ is used to insert a string before the defined pattern.

You can search a word or multiple words in a text by using grep. Regular expressions are the keywords used  during a search.

Main regular expressions:
\<KEY       – Words beginning with ‘KEY’
WORD\>   – Words ending with ‘WORD’
^             – Beginning of a line
$              – End of a line
[Range]     – Range of ASCII characters enclosed
[^c ]         – Not the character ‘c’
\[              – Interpret character ‘[‘ literally
“ca*t”        – Strings containing ‘c’ followed by no ‘a’ or any number of the letter ‘a’ followed by a ‘t’
“.”             – Match any single character

Main Extended regular expressions:
“A1|A2|A3”       – Strings containing ‘A1’ or ‘A2’ or ‘A3’
“ca+t”               – Strings containing a ‘ca’  followed by  any number of the letter ‘a’ followed by a ‘t’
“ca?t”                – Strings containing ‘c’ followed by no ‘a’ or exactly  one ‘a’ followed by a ‘t’
“ca*t”                – Strings containing ‘c’ followed by no ‘a’ or any number of the letter ‘a’ followed by a ‘t’

THE GREP FAMILY

grep: supports regex
egrep: supports eregex
fgrep: no regex or eregex support

syntax:

grep pattern file

Options:
-c count the number of lines matching PATTERN
-f obtain PATTERN from a file
-i ignore case sensitivity
-n indicate the input file’s line number
-v output all line except those containing PATTERN
-w match exact PATTERN

Text filters are used to modify the output of a file. They are helpful when searching for a specific keyword  and viewing a different output format from a file.

cat command is used to display the contents of a file. There are helpful options in viewing the file contents. The most commonly used are:
–   n: number each line of output
–   b: number only non-blank output line
–   A: show carriage return

Another function of cat is it can be used as a rudimentary text editor. How? By using a redirect. A redirect is used with a greater than sign >. Here’s a sample:

cat > sample-file
I'm typing the content of the sample-file.

Use CTRL+D to to save the content and end the interactive input.

You can also use the tac command to read text from the last to the first line. Obviously it’s the opposite of cat.

When analyzing log files, head or tail commands are helpful. By default, both output 10 lines of  text.

head file: outputs the first 10 lines
head [-20] file: outputs the first 20 lines
tail [-20] file: outputs the last 20 lines
head [+25] file: list text starting at line 25
tail -f file: continuously read a file, for real-time monitoring.

The wc utility for word count. It counts the number of bytes, words, and lines in files.
Options for wc output are:
–   l: count number of lines
–   w: count number of words
–   c or m: count number of bytes or characters

Another utility is nl. It functions like the cat -n.

nl -ba file: number lines including blanks
nl -bt file: number lines with text

The expand utility is used to replace TABs with spaces while the unexpand command is used for reverse operation.

To view binary files, use the hexdump utility. od utility may also be used.

To split a file into a smaller file, use the split utility.

split -1 5 file: create files xaa to xae w/ 5 lines each
split -1 5 file name-: create files xname-aa to name-ae

To not display the consecutive identical lines, use the uniq utility.

To extract a range of characters or fields from each line of text, use the cut utility.

cut - c range1, range 2 file: to manipulate characters
cut -d delimeter -f fields --output-delimeter=" " file

paste utility concatenates two files next to each other.

paste file1 file2

join utility outputs match fields between two files.

join -1 field_num -2 field_num file1 file2

sort utility arrange text in alphabetical order. -n option is used for numerical sort.

fmt utility to format output lines. fmt options are:
–   w: number of characters per line
–   s: split long lines but do not refill
–   u: place one space between each word and two spaces at the end of the sentence.

pr utility to paginate a file.

tr utility to translate one set of character ot another.

tr 'character 'character' < file

sed utility to search and replace patterns in text

sed '/ search_pattern/ new_pattern' file: substitute a pattern
sed 's/search_pattern/ new_pattern/g' file: substitute a pattern
 s - substitute
 g - globally; force substitution
sed '/key/new_pattern/g' file: if string exists, substitute a pattern

options for sed
–   e: execute following command
–   f: read commands from file
–   n: do not print out unedited lines
commands of sed
d – delete an entire line
r – read a file and append to output
s – substitute
w – write output to file