Duplicates don't get rejected necessarily, rather just detected. From every document we process, we generate a hash. That's a random string of letters and numbers, which on itself has no reference to the original document. This hashes will help us detect on duplicates, but we can not retrieve any information.
