9

Find Duplicate Facts

Run this to find Facts of the same type with the same Date and optionally with completely identical sub-fields in any one record.

The minimum details that must match are the Fact name, including its Value if an attribute, and the Date.

Use the tick options to select which sub-fields to check. The Remove/Enable Ticks button toggles between minimal and maximum checks.

For any ticked option, all associated subsidiary fields are also checked, e.g. Source Citation (SOUR) checks that all citations are identical including the Source record, Assessment, Date, Where Within Source, Text From Source, Notes, Media, and Template Fields.

All white-space and control characters are disregarded when comparing text fields such as Notes and Text From Source, i.e. only printable characters are checked.

Click the X Close icon to abort the Plugin. The Find Facts button invokes the search and if any duplicates are found then lists them in a Result Set.

For further support please post in the Family Historian User Group forums orĀ Family Historian at Groups.io email where the plugin author will answer your questions which can be supplemented with screenshots if necessary.

Download Find Duplicate Facts V1.3

  • Family Historian Version(s): V6 V7
  • Plugin Type: Standard
  • Written by Mike Tate
  • View Source Code
  • Downloaded 669 times
  • No additional help available

Version History

  • V1.3: Correctly handle long text fields greater than 500 characters;
    Disregard white-space and control characters when comparing text fields;
    Include FH V7 Citation-specific Template Fields (_FIELD);
  • V1.2: Disregard the Fact order; Exclude the Place field from minimal match;
    Add checks for tags _SDATE, _FLGS, FAX, RESI, RESN, FONE, ROMN, MAP, LATI, LONG, ADR1-3, CITY, STAE, POST, CTRY;
  • V1.1: FH V7 Lua 3.5 IUP 3.28 compatible; Always produces Result Set;
  • V1.0: First published version;

9 thoughts on “Find Duplicate Facts”

  1. Hi Mike, re the description and the point that Facts must be consecutive for the Plug-In to have any chance of finding duplicates. I have two questions:
    1) How would yo sort ALL facts such that duplicates are consecutive? Is there a method for this?
    2) I guess once you have sorted All facts you have then lost any manually applied Fact sort logic. Unless you copy the Project and make the analysis in the copy – correct?

    BR
    David

    • 1) If the Facts are duplicates they must have the same Date, so you can use all the usual methods to sort information into Date order such as the Tools > Re-order Out-of-Sequence Data… command as explained in the FHUG Knowledge Base ~ Sorting Children, Spouses & Facts into Date Order.
      2) Yes, if you have applied some customised manual sorting of Facts then a Copy of the Project would be needed.

  2. Would it be possible to has a Find and Delete Duplicate Facts plugin? I have a database of 252,000+ individuals & 57,000 marriages. Your plugin reported over 4500 duplicates. It will take a very long time to manually delete every duplicate record. I programmed for 14 years back in 1980-1994 (several different languages) so I know what I am asking.

    Perhaps a box to the right to check for deleting the duplicate with a Select All box at the top?

    Of course, I would make a complete copy of the database before executing to make sure everything worked. but it would definitely be faster than editing each one.

    Thanks for your consideration

      • Thanks for the quick response. I gave this more of a think and I see there are issues with a mass delete.

        You would have to see both duplicates separately (some have a citation for 1 but not the other) and be able to select the one without the citation. Unless the one with citation always ends up in the Original Fact column.

        So, not as easy as all that. Guess I will be “plugging” away as this a few days to clean up my database.

        Great Plugin BTW

        • I assume you left all the tick options selected, especially the Source Citation (SOUR) option.
          In that case, all the Citations must be identical on each pair of Original Fact and Duplicate Fact.
          The Citations will be the same in number, and with the same details, and in the same order.
          So the Duplicate Fact is perfectly safe to delete.

          BTW: It would be better to hold this kind of discussion on one of the FH Forums, either the https://www.fhug.org.uk/forum/ website or the family-historian@groups.io Email forum.

          • Tried to create registration on FHUG w/o success. Your plugin definitely finds the duplicate facts. I was able to clean up hundreds.

            However, when I uncheck Sour/Sour2 I get 4500 duplicate facts where either the original or duplicate fact actually has a Citation (Text:Footnote) but not both. I want to save the fact with the citation. Since the citation can be in either column (original/duplicate), but is not in both, I cannot do a mass deletion of one column.

            I’ve looked for plugins that will allow me to bulk merge facts (retaining the citation) without success.

            Any suggestions?

  3. I’m sorry but I don’t have any suggestions except for you to write your own Fact Merge plugin.
    Bear in mind that the duplicate Facts may both have Citations but with different details or different numbers of Citations.
    So the merge process must include Citations from both Facts but avoid duplicated Citations.

Leave a Reply

Your email address will not be published. Required fields are marked *