How to remove duplicate files in a directory

372
find-duplicate-files-on-linux-hd

How to find and delete duplicate files within the same directory?

Method-1

Bash script to find and remove duplicate file using checksum

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
  [[ -f "$file" ]] || continue

  read cksm _ < <(md5sum "$file")
  if ((arr[$cksm]++)); then 
    echo "rm $file"
  fi
done

This is both recursive and handles any file name.

Method-2

One line command to find and remove all duplicate files in Linux

  1. Finding Duplicate Files (based on size first, then MD5 hash)

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

2. Delete found duplicate files

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d |  xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate | cut -f3-100 -d ‘ ‘ | tr ‘\n.’ ‘\t.’ | sed ‘s/\t\t/\n/g’ | cut -f2-100 | tr ‘\t’ ‘\n’ | perl -i -pe ‘s/([ (){}-])/\\$1/g’ | perl -i -pe ‘s/’\”/\\’\”/g’ | xargs -pr rm -v

If you want to delete files without asking permission, remove the -p after last xargs in the above command.

Method-3

1. How to test files having unique content?

if diff "$file1" "$file2" > /dev/null; then
    ...

2. How can we get list of files in directory?

files="$( find ${files_dir} -type f )"

We can get any 2 files from that list and check if their names are different and content are same.

Script

#!/bin/bash
# removeDuplicates.sh

files_dir=$1
if [[ -z "$files_dir" ]]; then
    echo "Error: files dir is undefined"
fi

files="$( find ${files_dir} -type f )"
for file1 in $files; do
    for file2 in $files; do
        # echo "checking $file1 and $file2"
        if [[ "$file1" != "$file2" && -e "$file1" && -e "$file2" ]]; then
            if diff "$file1" "$file2" > /dev/null; then
                echo "$file1 and $file2 are duplicates"
                rm -v "$file2"
            fi
        fi
    done
done

For example, we have some dir:

$> ls .tmp -1
all(2).txt
all.txt
file
text
text(2)

So there are only 3 unique files.

Lets run that script:

$> ./removeDuplicates.sh .tmp/
.tmp/text(2) and .tmp/text are duplicates
removed `.tmp/text'
.tmp/all.txt and .tmp/all(2).txt are duplicates
removed `.tmp/all(2).txt'

And we get only 3 files leaved.

$> ls .tmp/ -1
all.txt
file
text(2)

 

Facebook Comments