MSR (Mining Software Repository) School in Asia 2011

November 2, 2011, Nara Prefectural New Public Hall, Nara, Japan

We had a great success!! Thanks to all of the 75 active participants.

The goal of the tutorial is to give participants an overview of the Mining Software Repositories (MSR) filed and an opportunity to learn the techniques used in this field. MSR is a rapidly growing field that holds an annual tutorial and a co-located working conference at ICSE every year. This year, we have invited Dr. Tao Xie and Dr. Tien N. Nguyen who are the top researchers in the field.


09:00-09:45 registration
09:45-10:00 opening
10:00-12:00 Graph-Centric Software Mining (Tien)
12:00-13:00 lunch
13:00-14:30 Mining Software Engineering Data Part 1 (Tao)
14:30-14:45 coffee break
14:45-16:15 Mining Software Engineering Data Part 2 (Tao)


Early (by 8 Oct.) Late (by 1 Nov.) On-site (2 Nov.)
Regular JPY 10,000 JPY 20,000 JPY 20,000
Student JPY 5,000 JPY 10,000 JPY 20,000

The registration fee must be paid to participate the tutorial. Each registration includes a tutorial handout, lunch, and coffee break. Students may be requested to show your student ID in the registration desk to validate the registration. The cut-off date for early registration is 23:59 JST on October 8th, 2011. All registrants of IWSM/MENSURA 2011 can participate the tutorial without charge, but pre-registration is required.

We already had over 70 registrations! (20 Oct.) We can only take about 10 due to the limitation of the room size. Please resister to keep your seat ASAP!

Registration page:


We booked a Japanese style bar for dinner on 2 Nov., but we only have 40 seats. Please reserve your seat from this form by 28 Oct. (Fri) if you are interested in joining the dinner party.

Graph-Centric Software Mining

Tien N. Nguyen (Iowa State University)


In the past decade, the research community of Software Engineering(SE) has witnessed several successes from the Mining Software Repositories (MSR) area. MSR research has been able to exploit a rich set of knowledge from large-scale open-source software data including programs and their life-long histories. Mining and prediction approaches have been able to provide developers with important knowledge about their software to guide them in their development and maintenance activities and to help them in making informed decisions. Recently, an important direction in MSR is involved in the use of graph mining techniques, which have demonstrated their effectiveness and successes in several SE applications. Graph-based representations have more expressive powers and capture better the high-level semantics in software artifacts than the mining representations using pairs, sets, and sequences. In this tutorial, we will examine a wide range of graph-based mining models and approaches that have been applied to various SE problems and applications such as clone detection, programming and specification pattern mining, API adaptation and migration, pattern query, inter-framework pattern recovery, language translation, program comprehension, defect detection, debugging and fault location, fix propagation, recommendation systems, feature detection and tracking, structural testing, etc. Graph-based mining algorithms that have gained successes in MSR research will be visited including frequent subgraph mining, maximum common subgraph, subgraph isomorphism, tree matching, frequent subsets/subsequences, graph reachability patterns, etc. Finally, we will also envision different promising directions in the graph-based software mining research.

Dr. Tien N. Nguyen

Dr. Tien N. Nguyen is currently an Associate Professor in the Electrical and Computer Engineering Department at Iowa State University. His research interests include program analysis, mining software repositories, software maintenance and evolution. He received an ACM SIGSOFT Distinguished Paper Award at the ACM SIGSOFT Conference on Foundations of Software Engineering (ESEC/FSE 2009) on the topic “Graph-based Mining of Multiple Object Usage Patterns”. He has been awarded the Litton Industries Professorship Medallion Award from Iowa State University in 2008 for young faculty who exhibits excellent leadership and a recognized commitment in electrical and computer engineering research and teaching. His research has been supported by US National Science Foundation (NSF), Iowa State University’s ICUBE Center, ECE’s Palmer Challenge Program, and Agile Alliance Academic Program. He has served as the Chair of Research Demonstration Track at ACM SIGSOFT Conference on Foundations of Software Engineering (FSE 2010).

Mining Software Engineering Data

Tao Xie (North Carolina State University)


Software engineering data (such as code bases, execution traces, historical code changes, mailing lists, and bug databases) contains a wealth of information about a project’s status, progress, and evolution. Using well-established data mining techniques, practitioners and researchers can explore the potential of this valuable data in order to better manage their projects and to produce higher quality software systems that are delivered on time and on budget.

This talk presents the latest research in mining Software Engineering (SE) data, discusses challenges associated with mining SE data, highlights SE data mining success stories, and outlines future research directions. More information of related materials can be found at

Dr. Tao Xie

Tao Xie is an Associate Professor in the Department of Computer Science of the College of Engineering at North Carolina State University. He received his Ph.D. in Computer Science from the University of Washington in 2005, advised by David Notkin. Before that, he received an M.S. in Computer Science from the University of Washington in 2002, an M.S. in Computer Science from Peking University in 2000, advised by Hong Mei, and a B.S. in Computer Science from Fudan University in 1997. He worked as a visiting researcher at Microsoft Research Redmond and Microsoft Research Asia. His research interests are in software engineering, focusing on automated software testing and mining software engineering data. He leads the Automated Software Engineering Research Group at North Carolina State University.

Besides doing research, he has contributed to understanding the software engineering research community. He has served as the ACM SIGSOFT History Liaison in the SIGSOFT Executive Committee as well as the ACM History Committee. He received a National Science Foundation Faculty Early Career Development (CAREER) Award in 2009. He received 2008, 2009, and 2010 IBM Faculty Awards and a 2008 IBM Jazz Innovation Award. He received 2010 North Carolina State University Sigma Xi Faculty Research Award. He received the ASE 2009 Best Paper Award and an ACM SIGSOFT Distinguished Paper Award. He is an ACM Distinguished Speaker and an IEEE Computer Society Distinguished Visitor (2011-2013). His research has been supported by NSF, NIST, ARO, IBM, Microsoft Research, and ABB Research. He was Program Co-Chair of 2009 IEEE International Conference on Software Maintenance (ICSM) and is Program Co-Chair of 2011 and 2012 International Working Conference on Mining Software Repositories (MSR). He has served on program committees of various conferences, including ICSE, ASE, ISSTA, and WWW.

This tutorial is hosted and supported by JSSST (Japan Society for Software Science and Technology), as a co-located event of IWSM/MENSURA2011.

Tutorial Chair: Masao Ohira (Nara Institute of Science and Technology)
Student Volunteer: Daisuke Nakano (Nara Institute of Science and Technology)

Contact: jssst-tutorial-kansai dot is dot naist dot jp