TY - GEN AB - Analyzing big data is a task encountered across disciplines. Addressing the challenges inherent in dealing with big data necessitate solutions that cover its three defining properties: volume, variety, and velocity. However, what is less understood is the treatment of the data that must be completed even before any analysis can begin. Specifically, there is often a non-trivial amount of time and resources that are utilized to the end of retrieving and preprocessing big data. This problem, known collectively as data integration, is a term frequently used for the general problem of taking data in some initial form and transforming it into a desired form. Examples of this include the rearranging of fields, changing the form of expression of one or more fields, altering the boundary notation of records and/or fields, encrypting or decrypting records and/or fields, parsing non-record data and organizing it into a record-oriented form, etc. In this work, we present our progress in creating a benchmarking suite that characterizes a diverse set of data integration applications. AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AD - Washington University in St. Louis AU - Cabrera, Anthony M AU - Faber, Clayton AU - Cepeda, Kyle AU - Deber, Robert AU - Epstein, Cooper AU - Zheng, Jason AU - Cytron, Ron K AU - Chamberlain, Roger DA - 2018-02-18 DO - 10.7936/K7NZ8715 DO - DOI ID - 9 KW - Computer and information sciences KW - data integration KW - format normalization KW - Application-Specific Instruction Processor L1 - https://data.library.wustl.edu/record/9/files/DIBS.zip L1 - https://data.library.wustl.edu/record/9/files/README.txt L2 - https://data.library.wustl.edu/record/9/files/DIBS.zip L2 - https://data.library.wustl.edu/record/9/files/README.txt L4 - https://data.library.wustl.edu/record/9/files/DIBS.zip L4 - https://data.library.wustl.edu/record/9/files/README.txt LA - eng LK - https://data.library.wustl.edu/record/9/files/DIBS.zip LK - https://data.library.wustl.edu/record/9/files/README.txt N2 - Analyzing big data is a task encountered across disciplines. Addressing the challenges inherent in dealing with big data necessitate solutions that cover its three defining properties: volume, variety, and velocity. However, what is less understood is the treatment of the data that must be completed even before any analysis can begin. Specifically, there is often a non-trivial amount of time and resources that are utilized to the end of retrieving and preprocessing big data. This problem, known collectively as data integration, is a term frequently used for the general problem of taking data in some initial form and transforming it into a desired form. Examples of this include the rearranging of fields, changing the form of expression of one or more fields, altering the boundary notation of records and/or fields, encrypting or decrypting records and/or fields, parsing non-record data and organizing it into a record-oriented form, etc. In this work, we present our progress in creating a benchmarking suite that characterizes a diverse set of data integration applications. PY - 2018-02-18 T1 - Data Integration Benchmark Suite v1 TI - Data Integration Benchmark Suite v1 UR - https://data.library.wustl.edu/record/9/files/DIBS.zip UR - https://data.library.wustl.edu/record/9/files/README.txt Y1 - 2018-02-18 ER -