You have two data files and you need to compare them and find lines that exist in one file but not in the other.
Sort the files and isolate the data of interest using cut or awk if necessary, and then use comm, diff, grep, or uniq depending on your needs.
comm is designed for just this type of problem:
$ cat left record_01 record_02.left only record_03 record_05.differ record_06 record_07 record_08 record_09 record_10 $ cat right record_01 record_02 record_04 record_05 record_06.differ record_07 record_08 record_09.right only record_10 # Only show lines in the left file $ comm -23 left right record_02.left only record_03 record_05.differ record_06 record_09 # Only show lines in the right file $ comm -13 left right record_02 record_04 record_05 record_06.differ record_09.right only # Only show lines common to both files $ comm -12 left right record_01 record_07 record_08 record_10
diff will quickly show you all the differences from both files, but its
output is not terribly pretty and you may not need to know all the
differences. GNU grep’s -y
and -w
options can be handy for readability, but you can get used to the
regular output as well. Some systems (e.g., Solaris) may use sdiff instead of diff-y
or have a separate binary such
as bdiff to process very large
files.
$ diff -y -W 60 left right record_01 record_01 record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_07 record_07 record_08 record_08 record_09 | record_09.right only record_10 record_10 $ diff -y -W 60 --suppress-common-lines left right record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_09 | record_09.right only $ diff left right 2,5c2,5 < record_02.left only < record_03 < record_05.differ < record_06 --- > record_02 > record_04 > record_05 > record_06.differ 8c8 < record_09 --- > record_09.right only
grep can show you when lines exist only in one file and not the other, and you can figure out which file if necessary. But since it’s doing regular expression matches, it will not be able to handle differences within the line unless you edit the file that becomes the pattern file, and it will also get very slow as the file sizes grow.
This example shows all the lines that exist in the file left but not in the file right:
$ grep -vf right left record_03 record_06 record_09
Note that only “record_03” is really missing; the other two lines are simply different. If you need to detect such variations, you’ll need to use diff. If you need to ignore them, use cut or awk as necessary to isolate the parts you need into temporary files.
uniq -u
can show you only lines
that are unique in the files, but it will not tell you which file the
line came from (if you need to know that, use one of the previous
solutions). uniq -d
will show you
only lines that exist in both files:
$ sort right left | uniq -u record_02 record_02.left only record_03 record_04 record_05 record_05.differ record_06 record_06.differ record_09 record_09.right only $ sort right left | uniq -d record_01 record_07 record_08 record_10
3.133.111.85