flowchart BT subgraph File2 db1-1["id=1"] db1-2["id=2"] db1-3["id=3"] end subgraph File1 db2-1["id=1"] db2-2["id=2"] db2-3["id=3"] end db1-1-->db2-1 db1-2-->db2-2 db1-3-->db2-3
Merging 1:1
When you can uniquely identify all observations in each dataset using a common identifier viariable.
Merging m:1
When you are trying to transfer family level data to each family member. The family member file is uniquely identified
flowchart BT subgraph members ["Family Members"] db1-1["hid=1,pid=1"] db1-2["hid=1,pid=2"] db1-3["hid=2,pid=1"] db1-4["hid=2,pid=2"] db1-5["hid=3,pid=1"] db1-6["hid=3,pid=2"] db1-7["hid=3,pid=3"] end subgraph Family db2-1["hid=1"] db2-2["hid=2"] db2-3["hid=3"] end db2-1-->db1-1 db2-1-->db1-2 db2-2-->db1-3 db2-2-->db1-4 db2-3-->db1-5 db2-3-->db1-6 db2-3-->db1-7
Merging 1:m
When you try to merge together data from various family members to a single family file. Each member is merged to its family.
flowchart BT subgraph members ["Family Members"] db1-1["hid=1,pid=1"] db1-2["hid=1,pid=2"] db1-3["hid=2,pid=1"] db1-4["hid=2,pid=2"] db1-5["hid=3,pid=1"] db1-6["hid=3,pid=2"] db1-7["hid=3,pid=3"] end subgraph Family db2-1["hid=1"] db2-2["hid=2"] db2-3["hid=3"] end db1-1-->db2-1 db1-2-->db2-1 db1-3-->db2-2 db1-4-->db2-2 db1-5-->db2-3 db1-6-->db2-3 db1-7-->db2-3
Merging m:m
Something you NEVER want to do. It tries to merge to datasets based on a single set identification variable. This variable does not identify unique observations in either file. The merge is done by “id” based on the order they appear in the data. It will typically provide very odd results.
flowchart BT subgraph members ["Family Members 1"] db1-1["hid=1,pid=1"] db1-2["hid=1,pid=2"] db1-3["hid=2,pid=1"] db1-4["hid=2,pid=2"] db1-5["hid=3,pid=1"] db1-6["hid=3,pid=2"] db1-7["hid=3,pid=3"] end subgraph Family ["Family Members 2"] db2-1["hid=1,pid=2"] db2-2["hid=1,pid=3"] db2-3["hid=1,pid=4"] db2-4["hid=2,pid=1"] db2-5["hid=2,pid=3"] db2-6["hid=3,pid=2"] db2-7["hid=3,pid=4"] end db2-1-->db1-1 db2-2-->db1-2 db2-4-->db1-3 db2-5-->db1-4 db2-6-->db1-5 db2-7-->db1-6
True Merge m x m
If you were trying m:m
merge (which is probably wrong), means you want to joinby
. This means merge both files using all combinations of individuals that have the same id. This will create a very large dataset, unless other restrictions are applied.
flowchart BT subgraph members ["Family Members 1"] db1-1["hid=1,pid=1"] db1-2["hid=1,pid=2"] db1-3["hid=2,pid=1"] db1-4["hid=2,pid=2"] db1-5["hid=3,pid=1"] db1-6["hid=3,pid=2"] db1-7["hid=3,pid=3"] end subgraph Family ["Family Members 2"] db2-1["hid=1,pid=2"] db2-2["hid=1,pid=3"] db2-3["hid=1,pid=4"] db2-4["hid=2,pid=1"] db2-5["hid=2,pid=3"] db2-6["hid=3,pid=2"] db2-7["hid=3,pid=4"] end db2-1-->db1-1 db2-2-->db1-1 db2-3-->db1-1 db2-1-->db1-2 db2-2-->db1-2 db2-3-->db1-2 db2-4-->db1-3 db2-4-->db1-4 db2-5-->db1-4 db2-5-->db1-3 db2-6-->db1-5 db2-6-->db1-6 db2-6-->db1-7 db2-7-->db1-5 db2-7-->db1-6 db2-7-->db1-7