Data Fusion
Jump to navigation
Jump to search
Data fusion, also known as statistical matching, involves combining the data from two data files, where the samples for the data files are not overlapping. For example, if there is one study looking at customer satisfaction and a completely separate study which is looks at brand attitudes, data fusion can be used to combine the data.
Creating a fusion
- Create a document containing multiple data files.
- In each of the files that you wish to fuse, create a micro-segment variable which has the following properties:
- It is a Nominal or Ordinal variable.
- It has the same Name, Label, and Variable Set Structure in each data file.
- It has the same unique values in each file. For example, if in one file all respondents have values of 1, 2, 3, ..., 100, then the same must be true in the other file. Importantly, there cannot be a situation where a value appears in one file but not the other.
- The unique values represent small segments. The assumption of the analysis is that:
- The people in a data file in one of these segments are broadly similar to those in the other data file of the same value.
- The segments explain differences between people in both data files. For example, if fusing brand attitudes with customer satisfaction data, if it is the case that age is the key determinant of both brand attitudes and customer satisfaction, then you could use age as the variable. More commonly, it will be appropriate to create an index representing multiple variables.
- Specify a Many to many relationship in Edit Data File Relationships.
How it works
- The sample size of the combined data will be that of the Recipient data file specified in Edit Data File Relationships.
- All of the respondents in the recipient sample are kept and used in analyses.
- The respondents in the other data file are probabilistically selected to match the same number of respondents in the recipient, for each matching value in the micro-segment variable. For example, if the micro-segment variable is "Gender" and there are 10 Males in the recipient data file, and 20 Males in the other data file, 10/20 Males are probabilistically selected from the other data file to be used in analyses.
- The other data file's Weights, if any, are an input to probabilistically selecting its respondents.