General |
I am currently a third year graduate student majoring in Computer Software and Theory at Fudan University. In September 2003, I joined the Shanghai International Database Research Center led by Professor Baile Shi. My research topics involve Database Systems, Data Mining and Information Retrieval. Most of my research is supported in part by the National Natural Science Foundation of China. My studies and research at Fudan are advised by Professor Wei Wang. I also received my B.S. degree in Computer Science and Technology from Fudan University in 2005.
|
Research Interests |
- Data Mining
Clustering, Classification, and Privacy Preserving Data Mining
- Management of Data
Spatial Databases, Data Warehousing and OLAP, Data Privacy, and Novel Database Applications
- Information Retrieval
Web-based Search and Optimization
|
Projects and Research |
- Keyword Clustering
Summer Intern, Google Inc., Summer'07
I worked on the problem of clustering keywords according to their semantics. The central problem is to define a novel similarity metric according to the semantic relationship between different keywords. Another important concern is what kind of clustering algorithm should be applied once similarity between keywords is available. How can a clustering algorithm generate better and more meaningful keyword clusters. This work is advised by Pengjun Lu and Yang Liu.
- Product Data Management System of Shanghai Aircraft Manufactory
Software Engineer, Shanghai International Database Research Center, Winter'06 - Present
This project aims at improving the management of product data so that it can better facilitate the frequently posed queries as well as the on site assembly work. Due to the specific data and data structures of the products in the field of aircraft manufature, the project poses some interesting challenges in our analysis and design. I am primarily responsible for (1) extracting an abstract model in the domain so that the system can adapt to different kinds of products and (2) performing detailed analysis and design to guide the implementation of the system.
- Data Privacy and Security (Supported by NSF, China)
Research Assistant, Shanghai International Database Research Center, Spring'05 - Present
How to guarantee privacy and security of data becomes a more and more serious concern in many applications. In this project, I mainly focus on the problem of Privacy Preserving Data Mining. The aim is to ensure that privacy is protected in the data mining process. My recent research is anonymizing data while keeping the data as useful as possible. The research is in collaboration with Dr. Jian Pei at the Simon Fraser University and Dr. Ada Wai-Chee Fu at the Chinese University of Hong Kong.
- A Data Mining Platform Based-on Web Service (Supported by the 863 High-Tech Project, China)
Research Assistant, Shanghai International Database Research Center, Summer'03 - Spring'05
I worked on the problem of designing a uniform data structure for various aggregate queries answering. Traditional data cubing methods can only answer a particular type of query. Yet, people might pose various queries on a same dataset. This motivated us to develop a novel tree structure based on probability density that can answer multiple queries. This work was closely collaborated with Tianyi Wu now at the University of Illinois at Urbana Champaign.
- Advanced Applications of Clustering Algorithms to the Web (Supported by NSF, China)
Research Assistant, Shanghai International Database Research Center, Fall'02 - Fall'03
In this project, I tried to get familiar with some basic Data Mining concepts and techniques. I focused on the clustering problem which is an unsupervised learning problem. I implemented several classical clustering algorithms such as K-MEANS, BIRCH, DBSCAN, OPTICS, etc. I also developed a profound insight into the problem of clustering when poring over papers and implementing algorithms.
|
Publications |
|
|
Selected Readings |
- The Great Game: The Emergence of Wall Street as a World Power, by John Steele Gordon.
- Fortress Besieged, by Zhongshu Qian.
- Introduction to Data Mining, by Pang-Ning Tan,Michael Steinbach,Vipin Kumar, Addison-Wesley, 2005.
- The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, by Michael C. Daconta, Leo J. Obrst, Kevin T. Smith, Wiley Press, 2003.
- Modern Information Retrieval, by Ricardo Baeza-Yates, Berthier Ribiero-Neto, and Berthier Ribeiro-Neto, Pearson Education, 1999.
- Matrix Analysis, by Roger A. Horn, Charles R. Johnson, Cambridge University Press, 1986.
- Introduction to Algorithms (2nd Edition), by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, The MIT Press, 2001.
- Data Mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2000.
- Machine Learning, by Tom M. Mitchell, McGraw-Hill Science/Engineering/Math, 1997.
- Concrete Mathematics: A Foundation for Computer Science (2nd Edition), by Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Addison-Wesley, 1994.
A list of some interesting papers I read in the past is available here.
|
| Teaching |
|
Useful Links |
Google|DBLP|Citeseer|Google Scholar|DB World
SIGMOD/PODS|SIGKDD|SIGIR|VLDB|ICDE|ICDM|ICDT|EDBT|WWW|SDM|SSDBM|CIKM
DASFAA|PAKDD|ICML|ECML|PKDD
Hao Du|Dong Guo|Jian Huang|Yiyi Huang|Zhenhua Lin|Xi Liu|Tianyi Wu|Zhijun Yin|Xiaohui Yu
Bin Zhou|Changyin Zhou|Shile Zhang|Ding Zhou|Jiajun Zhu
|
| Misc |
- My Album
- My beloved hometown - Suzhou, China
- I studied in ...
- I worked in ...
|