Missing Semester - Lecture 4

# journalctl run remotely, grep run locally
ssh myserver journalctl | grep sshd
# both commands run remotely
ssh myserver "journalctl | grep sshd"

`sed`

sed (Stream Editor) is a powerful command-line tool in Unix/Linux that is used for parsing and transforming text, typically used for finding, replacing, or deleting content in a file or input stream.

Common sed Commands:

Substitution (s):

sed 's/pattern/replacement/' file

s: The substitution command.
pattern: The string or pattern to search for.
replacement: The string to replace the pattern with.

Example:

sed 's/apple/orange/' fruits.txt

This replaces the first occurrence of the word “apple” with “orange” on each line.

Delete (d):

sed '/pattern/d' file

Deletes all lines that match a given pattern.

Print (p):

sed -n 'p'

Prints lines (used with -n to suppress default behavior).

Example:

# This will print lines 2 and 3
sed -n '2,3p' example.txt

Basic, Common, and Most-Used Regular Expressions (Regex)

Basic Regex Symbols and Their Meaning

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match.

. (dot): Matches any single character except a newline.
- Regex: a.b
- Matches: "a1b", "axb", "a b"
- Does not match: "ab", "a\nb"
* (asterisk): Matches zero or more occurrences of the preceding character.
- Regex: ca*t
- Matches: "ct", "cat", "caat", "caaaaat"
+ (plus): Matches one or more occurrences of the preceding character.
- Regex: ca+t
- Matches: "cat", "caat"
- Does not match: "ct"
? (question mark): Matches zero or one occurrence of the preceding character.
- Regex: ca?t
- Matches: "cat", "ct"
- Does not match: "caat"
{} (braces): Matches a specific number of occurrences.
- {n}: Exactly n occurrences.
  - Regex: a{3}
  - Matches: "aaa"
- {n,}: At least n occurrences.
  - Regex: a{2,}
  - Matches: "aa", "aaa", "aaaa"
- {n,m}: Between n and m occurrences.
  - Regex: a{2,4}
  - Matches: "aa", "aaa", "aaaa"
  - Does not match: "a", "aaaaa"

Anchors (for Position Matching)

Anchors don’t match characters, but rather positions in the string.

^ (caret): Matches the start of a string.
- Regex: ^hello
- Matches: "hello world"
- Does not match: "world hello"
$ (dollar sign): Matches the end of a string.
- Regex: world$
- Matches: "hello world"
- Does not match: "world hello"
\b (word boundary): Matches the position between a word character (\w) and a non-word character.
- Regex: \bcat\b
- Matches: "cat is here", "I have a cat"
- Does not match: "caterpillar", "catch"

Grouping and Alternation

Parentheses (): Used to group part of the regex for applying quantifiers or capturing matches.
- Regex: (ab)+
- Matches: "ab", "abab", "ababab"
- Does not match: "a", "b", "aab"
Pipe | (Alternation): Acts like a logical OR, matching either of the patterns.
- Regex: cat|dog
- It looks for the entire word
- Matches: "cat" or "dog"

Summary of Most-Used Regex Elements

Regex Symbol	Meaning
`.`	Any single character (except newline)
`*`	Zero or more of the previous element
`+`	One or more of the previous element
`?`	Zero or one of the previous element
`{n}`	Exactly `n` occurrences
`^`	Start of a string
`$`	End of a string
`\[\]`	Character class (match any character inside)
`\d`	Any digit (0-9)
`\w`	Any word character (alphanumeric + underscore)
`\s`	Any whitespace character
`	`
`()`	Grouping

Example Use of `sed` with Regex

Command:

sed 's/[0-9]/#/g'

Explanation:

This will replace all digits (\[0-9\]) in the input with #(\ is the escape character here).

ssh myserver journalctl
 | grep sshd
 | grep "Disconnected from"
 | sed -E 's/.*Disconnected from (invalid |authenticating )?user (.*) [^ ]+ port [0-9]+( \[preauth\])?$/\2/'
 | sort | uniq -c
 | sort -nk1,1 | tail -n10
 | awk '{print $2}' | paste -sd,

sort -k 1,1: The -k option specifies the sort key, which determines which part of the line should be used for sorting.

1,1 means the sort is done based on the first field (column).
Fields are separated by whitespace by default.
1,1 restricts the sorting to the first field only, and no other part of the line is used for sorting.

paste -sd: This command can combine lines of input.

-s: The -s option tells paste to merge all the input lines into a single line (instead of pasting them side by side).
-d,: The -d option specifies the delimiter, which in this case is a comma (,). It tells paste to join the items using a comma.

`awk`

awk is a powerful command-line tool used for text processing and data extraction in Unix/Linux environments. It operates on files or input streams, typically treating each line as a record, and each part of the line as a field. awk is ideal for extracting specific fields, performing operations on data, and formatting output.

Basic `awk` Command Format:

awk 'pattern { action }' [file]

pattern: The condition or pattern that determines which lines awk will process.
action: The operation to perform on the lines that match the pattern.
file: The input file (or input from stdin if no file is provided).

Basic Examples

1. Print Every Line of a File

awk '{ print $0 }' filename

This command prints every line of the file ($0 refers to the entire line).

2. Print a Specific Field

awk '{ print $2 }' filename

This prints the second field ($2) of each line in the file.

Example Input (`example.txt`)

John 25 Manager
Jane 30 Developer
Tom 22 Designer

awk '{ print $1 }' example.txt

Output:

John
Jane
Tom

This prints only the first field (name) from each line.

Common `awk` Use Cases

1. Print Specific Fields

You can specify which fields (columns) to print using $1, $2, etc.

awk '{ print $1, $3 }' example.txt

Output:

John Manager
Jane Developer
Tom Designer

2. Specify a Field Separator

By default, awk assumes fields are separated by whitespace. You can change the field separator using the -F option.

Example with a CSV file:

John,25,Manager
Jane,30,Developer
Tom,22,Designer

To print the first and third fields of a CSV file:

awk -F',' '{ print $1, $3 }' example.csv

Output:

John Manager
Jane Developer
Tom Designer

Here, -F',' tells awk to use a comma as the field separator.

3. Conditional Processing

You can apply conditions to control which lines are processed.

Example: Print lines where the second field (age) is greater than 25.

awk '$2 > 25 { print $1, $2 }' example.txt

Output:

Jane 30

This prints only the lines where the second field (age) is greater than 25.

4. Perform Arithmetic Operations

awk can perform arithmetic on fields.

Example: Add 10 to each person’s age.

awk '{ print $1, $2 + 10 }' example.txt

Output:

John 35
Jane 40
Tom 32

5. Pattern Matching

You can use regular expressions to match patterns.

Example: Print lines that contain the word “Developer”:

awk '/Developer/ { print $0 }' example.txt

Output:

Jane 30 Developer

You can use awk to process lines that match (or don’t match) specific patterns.

Advanced Features of `awk`

1. BEGIN and END Blocks

awk allows you to define special actions at the start and end of processing.

BEGIN: Executes before processing the input.
END: Executes after processing all input.

Example: Calculate the sum of ages.

awk 'BEGIN { sum = 0 } { sum += $2 } END { print "Total age:", sum }' example.txt

Output:

Total age: 77

Here:

The BEGIN block initializes the sum variable to 0.
The main block { sum += $2 } adds the second field (age) to the sum for each line.
The END block prints the total after processing all lines.

2. Built-in Variables

awk has several built-in variables:

NR: Current record number (line number).
NF: Number of fields in the current record.
$0: The entire line.

Example: Print each line with its line number.

awk '{ print NR, $0 }' example.txt

Output:

1 John 25 Manager
2 Jane 30 Developer
3 Tom 22 Designer

3. String Manipulation

awk provides functions to manipulate strings, like length(), substr(), tolower(), and toupper().

Example: Print the length of the first field.

awk '{ print $1, length($1) }' example.txt

Output:

John 4
Jane 4
Tom 3

Summary of `awk` Usage

Command	Description
`awk '{ print $1 }' file`	Print the first field (column) of each line.
`awk -F',' '{ print $2 }' file`	Specify a field separator (comma in this case).
`awk '$2 > 25 { print $1 }' file`	Print the first field if the second field is greater than 25.
`awk 'BEGIN { action } { action } END { action }' file`	Use `BEGIN` and `END` blocks for initialization and final actions.
`awk '{ print NR, $0 }' file`	Print the line number along with the entire line.

awk is a versatile tool for processing and extracting data from text files or input streams. It’s great for manipulating structured data, performing arithmetic, and applying conditions based on patterns.

Exercises

https://regexone.com/lesson/introduction_abcs -> Great tutorial!

sed#

Basic, Common, and Most-Used Regular Expressions (Regex)#

Basic Regex Symbols and Their Meaning#

Anchors (for Position Matching)#

Grouping and Alternation#

Summary of Most-Used Regex Elements#

Example Use of sed with Regex#

awk#

Basic awk Command Format:#

Basic Examples#

1. Print Every Line of a File#

2. Print a Specific Field#

Example Input (example.txt)#

Common awk Use Cases#

1. Print Specific Fields#

2. Specify a Field Separator#

3. Conditional Processing#

4. Perform Arithmetic Operations#

5. Pattern Matching#

Advanced Features of awk#

1. BEGIN and END Blocks#

2. Built-in Variables#

3. String Manipulation#

Summary of awk Usage#

Exercises#

`sed`

Basic, Common, and Most-Used Regular Expressions (Regex)

Basic Regex Symbols and Their Meaning

Anchors (for Position Matching)

Grouping and Alternation

Summary of Most-Used Regex Elements

Example Use of `sed` with Regex

`awk`

Basic `awk` Command Format:

Basic Examples

1. Print Every Line of a File

2. Print a Specific Field

Example Input (`example.txt`)

Common `awk` Use Cases

1. Print Specific Fields

2. Specify a Field Separator

3. Conditional Processing

4. Perform Arithmetic Operations

5. Pattern Matching

Advanced Features of `awk`

1. BEGIN and END Blocks

2. Built-in Variables

3. String Manipulation

Summary of `awk` Usage

Exercises