A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data

Lim, Sunghoon; Tucker, Conrad S.

doi:10.1115/1.4033238

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

임성훈

Lim, Sunghoon: Industrial Intelligence Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data

Author(s): Lim, Sunghoon, Tucker, Conrad S.

Issued Date: 2016-06

DOI: 10.1115/1.4033238

URI: https://scholarworks.unist.ac.kr/handle/201301/24676

Fulltext: http://mechanicaldesign.asmedigitalcollection.asme.org/article.aspx?articleid=2511302

Citation: JOURNAL OF MECHANICAL DESIGN, v.138, no.6, pp.061403

Abstract: The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term "screen," which may return relevant results such as " the screen size is just perfect," but may also contain irrelevant noise such as " researchers should really screen for this type of error." A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

Publisher: ASME

ISSN: 1050-0472

Keyword (Author): product design, product feature extraction, information retrieval, online, Bayesian, text mining, training data, non-i.i.d., keyword

Keyword: DESIGN, OPTIMIZATION, SIZE

Show Full Item Record

qrcode

STATISTICS: Total View :274,632; Total Download :11,840; Today View :140

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.