How to remove duplicate files in a directory

484

find-duplicate-files-on-linux-hd — Illustration of folders and magnifying glass highlighting duplicate files on a computer.

How to find and delete duplicate files within the same directory?

Method-1

Bash script to find and remove duplicate file using checksum

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
  [[ -f "$file" ]] || continue

  read cksm _ < <(md5sum "$file")
  if ((arr[$cksm]++)); then 
    echo "rm $file"
  fi
done

This is both recursive and handles any file name.

Method-2

One line command to find and remove all duplicate files in Linux

Finding Duplicate Files (based on size first, then MD5 hash)

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

2. Delete found duplicate files

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate | cut -f3-100 -d ‘ ‘ | tr ‘\n.’ ‘\t.’ | sed ‘s/\t\t/\n/g’ | cut -f2-100 | tr ‘\t’ ‘\n’ | perl -i -pe ‘s/([ (){}-])/\\$1/g’ | perl -i -pe ‘s/’\”/\\’\”/g’ | xargs -pr rm -v

If you want to delete files without asking permission, remove the -p after last xargs in the above command.

Method-3

1. How to test files having unique content?

if diff "$file1" "$file2" > /dev/null; then
    ...

2. How can we get list of files in directory?

files="$( find ${files_dir} -type f )"

We can get any 2 files from that list and check if their names are different and content are same.

Script

#!/bin/bash
# removeDuplicates.sh

files_dir=$1
if [[ -z "$files_dir" ]]; then
    echo "Error: files dir is undefined"
fi

files="$( find ${files_dir} -type f )"
for file1 in $files; do
    for file2 in $files; do
        # echo "checking $file1 and $file2"
        if [[ "$file1" != "$file2" && -e "$file1" && -e "$file2" ]]; then
            if diff "$file1" "$file2" > /dev/null; then
                echo "$file1 and $file2 are duplicates"
                rm -v "$file2"
            fi
        fi
    done
done

For example, we have some dir:

$> ls .tmp -1
all(2).txt
all.txt
file
text
text(2)

So there are only 3 unique files.

Lets run that script:

$> ./removeDuplicates.sh .tmp/
.tmp/text(2) and .tmp/text are duplicates
removed `.tmp/text'
.tmp/all.txt and .tmp/all(2).txt are duplicates
removed `.tmp/all(2).txt'

And we get only 3 files leaved.

$> ls .tmp/ -1
all.txt
file
text(2)

Facebook Comments

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Spirit of Dussehra

Tara Vina Shyam Mane – Navratri Garba

Ugadi – The beginning of a new age

Birthday Countdown Ideas for Instagram