| Year |
Author |
Title |
Journal / Conference |
| 2012 |
Adrian Benton, John H. Holmes, S. Hill, Annie Chung, Lyle Ungar |
|
Bioinformatics 28(5): 743 (2012) (software
release note) |
| 2011 |
Shawndra Hill, Jun Mao, Lyle Ungar, Sean Hennessy, Charles E. Leonard, John H. Holmes |
|
Journal of Medical Internet Research, 14(2) |
| 2011 |
Adrian Benton, Lyle Ungar, Shawndra Hill, Sean Hennessy, Jun Mao, Annie Chung, Charles E. Leonard, John H. Holmes |
|
Journal of Biomedical Informatics, 44(6): 989-996 |
| 2011 |
Shawndra Hill, Noah Ready-Campbell |
|
International Journal of Electronic Commerce, 15(3): 73 |
| 2011 |
Getachew Berhan, Tsegaye Tadesse, Solomon Atnafu, Shawndra Hill |
|
Journal of Strategic Innovation and Sustainability, Vol. 7, No. 1, Spring 2011 |
| 2011 |
Adrian Benton, Shawndra Hill, Lyle Ungar, Annie Chung, John Holmes |
- A System for De-Identifying Medical Message Board TextThere are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients' experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text
are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors' personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.
|
BMC Bioinformatics, 12(3):73 |
| 2006 |
Shawndra Hill, Deepak Agarwal, Robert Bell, Chris Volinsky |
|
Journal of Computational & Graphical Statistics, Vol. 15, No. 3, pp. 584-608 [Bibtex] |
| 2006 |
Shawndra Hill, Foster Provost, Chris Volinsky |
|
Statistical Science, Vol. 21, No. 2, pp. 256-276 [Bibtex] |
| 2005 |
Shawndra Hill, Abraham Bernstein, Foster Provost |
- Toward Intelligent Assistance for a Data Mining Process: An Ontology--Based Approach for Cost-Sensitive ClassificationA data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition.
|
IEEE Transactions on Knowledge and Data Engineering, Vol. 17, Iss. 4, pp. 503-518 [Bibtex] |
| 2003 |
Shawndra Hill, Foster Provost |
|
SIGKDD Explorations, Vol. 5, Iss. 2, pp. 179-184 [Bibtex] |
| 2012 |
Adrian Benton, Shawndra Hill |
|
CIST in Conjunction with INFORMS |
| 2012 |
Shawndra Hill, Adrian Benton |
|
The Second SOMA Workshop: Social Media Analytics |
| 2012 |
Shawndra Hill, Aman Nalavade, Adrian Benton |
|
ADKDD 2012: The Sixth International Workshop on Data Mining for Online Advertising and Internet Economy |
| 2012 |
Shawndra Hill, Adrian Benton |
|
Conditionally Accepted ICIS (International Conference on Information Systems) |
| 2011 |
Shawndra Hill, Benton A, Ungar L, Macskassy S, Chung A, Holmes JH |
- A Cluster-based method for isolating influence on TwitterThis paper demonstrates a cluster-based method to isolate influence in social networkbased
observational data, where "influence" is defined to mean that one person posts about a
topic online and a second person posts about the same topic because he or she read the first
post. Isolating influence in observational data is difficult, because we may observe that connected
people discuss the same topic in proximate periods for reasons other than influence, including
homophily–connected people are similar–and exogenous shock; they may have learned
of the topic from some external source. We employ a matched sample estimation technique
that has been used in the past to measure influence by controlling for demographic and usage
based homophily, and add to the matching scheme a cluster ID. Our contribution is two-fold:
First, we provide preliminary evidence that social network-based clusters capture homophily,
indicating that a network-based attribute approach may not only capture homophily but also
may be used in lieu of using demographic attributes for matching similar users in scenarios
when privacy preservation is a concern. Second, we show that by adding a network position
attribute, a cluster ID, when matching similar users, we can isolate influence better. We believe
that our approach to isolate influence can have a broad impact on problems where social
networks and associated behaviors can be observed over time.
|
Workshop on Information Technologies and Systems |
| 2011 |
Getachew Berhan, Shawndra Hill, Tsegaye Tadesse, Solomon Atnafu |
Geographic Information Systems and Geo-statistics for Modeling and Mapping Endangered Species: A Case Study in Bonga Forest of Ethiopia |
Global Information Technology Management Association |
| 2010 |
Adrian Benton, Shawndra Hill, Lyle Ungar, Annie Chung, John Holmes |
|
International Conference on Machine Learning Applications |
| 2010 |
Getachew Berhan, Tsegaye Tadesse, Solomon Atnafu, Shawndra Hill |
Knowledge Discovery from Satellite Images for Drought Monitoring: A Case Study in Ethiopia |
Global Information Technology Management Association World Conference |
| 2010 |
Justin Vastola, Kobi Abayomi, Shawndra Hill |
Statistics for Re-Identification Signatures in Networked Data |
Workshop on Information Networks |
| 2010 |
Shawndra Hill, Getachew Berhan, Anita Banser, Nathan Eagle |
|
AAAI Symposium on Artificial Intelligence for Development [Bibtex] |
| 2010 |
Mariye Yigzaw, Shawndra Hill, Anita Banser, Lemma Lessa |
|
AAAI Symposium on Artificial Intelligence for Development |
| 2010 |
Tibebe Besheh, Shawndra Hill |
|
AAAI Symposium on Artificial Intelligence for Development |
| 2009 |
Shawndra Hill and Akash Nagle |
- Social Network Signatures: A Framework for Re‐Identification in Networked DataData on large dynamic social networks, such as telecommunications
networks and the Internet, are pervasive. However,
few methods conducive to efficient large-scale analysis
exist. In this paper, we focus on the task of re-identification.
Re-identification in the context of dynamic networks is a
matching problem that involves comparing the behavior of
networked entities across two time periods. Prior research
has reported success in the domains of e-mail alias detection,
author attribution, and identifying fraudulent consumers
in the telecommunications industry. In this work,
we address the question of "why are we able to re-identify
entities on real world dynamic networks?" Our contribution
is two-fold. First, we address the challenge of scale
with a framework for matching that does not require pairwise
comparisons to ascertain the similarity scores between
networked entities. Second, we show our method is robust
against missing links but less tolerant to noise. Using
our framework, we provide a performance estimate for
re-identification on networks based solely on their degree
distribution and dynamics. This work has significant implications
for re-identification problems where scale is a challenge
as well as for problems where false negatives (e.g.,
when fraudulent consumers are not labeled as fraudulent)
cannot be observed.
|
International Conference on Computational Aspects of Social Networks [Bibtex] |
| 2007 |
Shawndra Hill, Foster Provost, Chris Volinsky |
- Learning and Inference in Massive Social NetworksResearchers and practitioners increasingly are gaining access to data on explicit social networks. For example, telecommunications and technology firms record data on consumer networks (via phone calls, emails, voice-over-IP, instant messaging), and social-network portal sites such as MySpace, Friendster and Facebook record consumer-generated data on social networks. Inference for fraud detection [5, 3, 8], marketing [9], and other tasks can be improved with learned models that take social networks into account and with collective inference [12], which allows inferences about nodes in the network to affect each other. However, these socialnetwork graphs can be huge, comprising millions to billions of nodes and one or two orders of magnitude more links.
|
5th International Workshop on Mining and Learning with Graphs [Bibtex] |
| 2005 |
Shawndra Hill, Deepak Agarwal, Robert Bell, and Chris Volinsky |
|
3rd International Workshop on Link Discovery |
| 2003 |
Shawndra Hill |
|
International Joint Conference on Artificial Intelligence |
| 2002 |
Abraham Bernstein, Scott Clearwater, Shawndra Hill, Claudia Perlich, Foster Provost |
- Discovering Knowledge from Relational Data Extracted from Business NewsThousands of business news stories (including press releases, earnings
reports, general business news, etc.) are released each day. Recently, information
technology advances have partially automated the processing of
documents, reducing the amount of text that must be read. Current techniques
(e.g., text classification and information extraction) for full-text analysis for the
most part are limited to discovering information that can be found in single
documents. Often, however, important information does not reside in a single
document, but in the relationships between information distributed over multiple
documents. This paper reports on an investigation into whether knowledge
can be discovered automatically from relational data extracted from large corpora
of business news stories. We use a combination of information extraction,
network analysis, and statistical techniques. We show that relationally interlinked
patterns distributed over multiple documents can indeed be extracted,
and (specifically) that knowledge about companies' interrelationships can be
discovered. We evaluate the extracted relationships in several ways: we give a
broad visualization of related companies, showing intuitive industry clusters;
we use network analysis to ask who are the central players, and finally, we
show that the extracted interrelationships can be used for important tasks, such
as for classifying companies by industry membership.
|
8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [Bibtex] |