Description
PyData SF 2016 Taposh Roy, Austin Powell | A hybrid approach to model randomness and fuzziness using Dempster Shaffer method
Current main stream data science work is mainly focused on prediction, segmentation and data analysis. This mainly involves supervised learning where we learn from historic data. The predicted observations usually gives a probability score which measures randomness. We believe there are situations where along with randomness we need some fuzziness to give some confidence to the observation.
Current main stream data science work is mainly focused on prediction, segmentation and data analysis. This mainly involves supervised learning where we learn from historic data. The predicted observations usually gives a probability score which measures randomness. We believe there are situations where along with randomness we need some fuzziness to give some confidence to the observation. Fuzziness and randomness are two very different concepts. We say that fuzziness occurs when there is no boundary between outcomes. And we commonly refer to randomness as the uncertainty associated with effective variability from alternative outputs. Although different, it is simple to provide real world scenarios where randomness and fuzziness work together. One example is in the stock market where there are no historical similarities to use in modeling such as new high patterns. Here, we say that there are no defined boundaries that forecast event probabilities in the next few days. Another example is in the emergency room where the early detection algorithm gives a risk score, but the environmental factors impacting the patient are not predictable. In both of these examples, we can fairly easily model the randomness, but there is a fuzziness that needs to be dealt with also. We will discuss a model we are researching that improves our predictability of that fuzziness and randomness. The heart of this improvement lies in how we approach variability. Variability of alternative outputs can describe randomness and is easily modeled using regression or other predictive analytic methods. Using fuzziness, a second source of variability can be individualized (per observation) in the vagueness with which the attributes are selected. This type of uncertainty is due to variety of environmental factors depending on the use case. For the stock market, uncertainty is due to a set of economic conditions and policy reforms that have not been looked at before, while for an emergency room scenario it might be due to variety of scenarios before the patient was hospitalized. Research recommends several different approaches to simulate the choice behavior in different choice contexts. Our focus is on designing choice models that can take into account one of two sources of uncertainty.