uniq Command Linux: Remove Duplicate Lines from Text Files Efficiently

August 25, 2025

The uniq command is a powerful Linux utility designed to remove duplicate lines from text files or input streams. Whether you’re cleaning up log files, processing data, or filtering output from other commands, uniq provides an efficient solution for handling duplicate content in your text processing workflows.

What is the uniq Command?

The uniq command filters out repeated lines in a file or input stream, keeping only unique entries. It’s particularly useful for data processing, log analysis, and text manipulation tasks where duplicate entries need to be eliminated or counted.

Key characteristics of the uniq command:

  • Only removes consecutive duplicate lines by default
  • Requires sorted input for complete duplicate removal
  • Can count occurrences of duplicate lines
  • Offers various options for customized filtering

Basic Syntax

uniq [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Where:

  • OPTIONS: Various flags to modify behavior
  • INPUT_FILE: Source file to process (stdin if omitted)
  • OUTPUT_FILE: Destination file (stdout if omitted)

Common Options and Flags

Option Description
-c Count occurrences of each line
-d Show only duplicate lines
-u Show only unique lines (no duplicates)
-i Case-insensitive comparison
-f N Skip first N fields when comparing
-s N Skip first N characters when comparing
-w N Compare only first N characters

Basic Examples

Example 1: Simple Duplicate Removal

Let’s create a sample file with duplicate lines:

$ cat > sample.txt
apple
banana
apple
cherry
banana
apple

Using uniq on this file:

$ uniq sample.txt
apple
banana
apple
cherry
banana
apple

Output: No change because duplicates aren’t consecutive. The uniq command only removes consecutive duplicate lines.

Example 2: Using uniq with sort

To remove all duplicates, first sort the file:

$ sort sample.txt | uniq
apple
banana
cherry

Output: All duplicate lines are removed, showing only unique entries.

Example 3: Counting Occurrences

Use the -c option to count duplicate occurrences:

$ sort sample.txt | uniq -c
      3 apple
      2 banana
      1 cherry

Output: Each line is prefixed with its occurrence count.

Advanced Usage Examples

Example 4: Show Only Duplicate Lines

Use -d to display only lines that appear more than once:

$ sort sample.txt | uniq -d
apple
banana

Output: Only lines with duplicates are shown.

Example 5: Show Only Unique Lines

Use -u to display only lines that appear exactly once:

$ sort sample.txt | uniq -u
cherry

Output: Only unique lines (no duplicates) are displayed.

Example 6: Case-Insensitive Comparison

Create a file with mixed case:

$ cat > mixed_case.txt
Apple
apple
APPLE
Banana
banana

Using case-insensitive comparison:

$ sort mixed_case.txt | uniq -i
Apple
Banana

Output: Case variations are treated as duplicates.

Example 7: Skip Fields When Comparing

Create a file with multiple fields:

$ cat > fields.txt
1 apple red
2 apple green
3 banana yellow
4 banana yellow
5 cherry red

Skip the first field when comparing:

$ sort -k2 fields.txt | uniq -f 1
2 apple green
3 banana yellow
5 cherry red

Output: Comparison starts from the second field onwards.

Example 8: Skip Characters When Comparing

Skip first 2 characters when comparing:

$ cat > chars.txt
xxapple
yyapple
zzbanana
wwbanana
$ sort chars.txt | uniq -s 2
xxapple
zzbanana

Output: First two characters are ignored during comparison.

Practical Use Cases

Use Case 1: Log File Analysis

Find unique IP addresses in access logs:

$ cat access.log | cut -d' ' -f1 | sort | uniq -c | sort -nr
    245 192.168.1.100
    123 10.0.0.1
     89 172.16.0.50

This command extracts IP addresses, counts their occurrences, and sorts by frequency.

Use Case 2: Email List Cleanup

Remove duplicate email addresses:

$ sort email_list.txt | uniq > clean_email_list.txt

This creates a clean list without duplicate entries.

Use Case 3: Configuration File Validation

Find duplicate configuration entries:

$ sort config.conf | uniq -d

This shows only duplicate configuration lines that might cause issues.

Combining uniq with Other Commands

Pipeline Example 1: Process Command Output

$ ps aux | grep apache | awk '{print $1}' | sort | uniq -c

Count unique users running Apache processes.

Pipeline Example 2: Text Processing

$ cat document.txt | tr ' ' '\n' | sort | uniq -c | sort -nr | head -10

Find the 10 most common words in a document.

Pipeline Example 3: Network Analysis

$ netstat -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr

Analyze network connections by remote IP address.

Performance Considerations

Memory Usage: The uniq command processes input line by line, making it memory-efficient even for large files.

File Size Impact: For best performance with large files, consider using:

$ sort -u large_file.txt > unique_file.txt

This combines sorting and duplicate removal in a single operation.

Preprocessing: Always sort data before using uniq for complete duplicate removal, unless you specifically need to remove only consecutive duplicates.

Common Pitfalls and Solutions

Pitfall 1: Not Sorting Input

Problem: Non-consecutive duplicates aren’t removed.

Solution: Always use sort before uniq for complete duplicate removal.

Pitfall 2: Whitespace Issues

Problem: Lines with different whitespace are treated as different.

Solution: Normalize whitespace before processing:

$ cat file.txt | sed 's/[[:space:]]\+/ /g' | sort | uniq

Pitfall 3: Case Sensitivity

Problem: Similar lines with different cases aren’t recognized as duplicates.

Solution: Use the -i flag for case-insensitive comparison.

Alternative Approaches

Using awk

$ awk '!seen[$0]++' file.txt

This removes duplicates without requiring sorted input.

Using sort with -u

$ sort -u file.txt

Combines sorting and duplicate removal in one command.

Best Practices

  • Always sort first: Use sort | uniq for complete duplicate removal
  • Use appropriate options: Choose -c, -d, or -u based on your specific needs
  • Consider performance: For large files, sort -u might be more efficient
  • Handle whitespace: Normalize whitespace when necessary
  • Test with sample data: Verify results with small datasets before processing large files

Conclusion

The uniq command is an essential tool for Linux text processing, offering flexible options for handling duplicate content. Whether you’re cleaning data, analyzing logs, or processing configuration files, mastering uniq and its various options will significantly improve your command-line efficiency. Remember to combine it with sort for complete duplicate removal and explore its various flags to match your specific use cases.

By understanding both basic usage and advanced techniques, you can leverage uniq to handle complex text processing tasks efficiently and effectively in your Linux workflow.