Abstract
In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the major defense against malware. Unfortunately, driven by the economic benefits, the number of new malware samples has explosively increased: anti-malware vendors are now confronted with millions of potential malware samples per year. In order to keep on combating the increase in malware samples, there is an urgent need to develop intelligent methods for effective and efficient malware detection from the real and large daily sample collection. In this article, we first provide a brief overview on malware as well as the anti-malware industry, and present the industrial needs on malware detection. We then survey intelligent malware detection methods. In these methods, the process of detection is usually divided into two stages: feature extraction and classification/clustering. The performance of such intelligent malware detection approaches critically depend on the extracted features and the methods for classification/clustering. We provide a comprehensive investigation on both the feature extraction and the classification/clustering techniques. We also discuss the additional issues and the challenges of malware detection using data mining techniques and finally forecast the trends of malware development.
- Tony Abou-As saleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. 2004. N-gram-based detection of new malicious code. In Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC). Google ScholarDigital Library
- David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37--66. Google ScholarDigital Library
- Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. 2011. Graph based malware detection using dynamic analysis. Journal in Computer Virology 4 (2011), 247--258. Google ScholarDigital Library
- Blake Anderson, Curtis Storlie, and Terran Lane. 2012. Improving malware classification: Bridging the static/dynamic gap. In Proceedings of 5th ACM Workshop on Security and Artificial Intelligence (AISec). Google ScholarDigital Library
- Anubis. 2010. Anubis: Analyzing Unknown Binaries. Retrieved from http://anubis.iseclab.org/.Google Scholar
- Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated classification and analysis of internet malware. In Proceedings of the 10th International Conference on Recent Advances in Intrusion Detection. Google ScholarDigital Library
- Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, behavior-based malware clustering. In Proceedings of the 16th Annual Network and Distributed System Security Symposium.Google Scholar
- Ulrich Bayer, Christopher Kruegel, and Engin Kirda. 2006a. TTAnalyze: A tool for analyzing malware. In EICAR.Google Scholar
- Ulrich Bayer, Andreas Moser, Christopher Kruegel, and Engin Kirda. 2006b. Dynamic analysis of malicious code. Journal in Computer Virology 2(1) (2006), 67--77. Google ScholarCross Ref
- Zahra Bazrafshan, Hashem Hashemi, Seyed Mehdi Hazrati Fard, and Ali Hamzeh. 2013. A survey on heuristic malware detection techniques. In Proceedings of the 5th Conference on Information and Knowledge Technology (IKT). Google ScholarCross Ref
- Philippe Beaucamps and ric Filiol. 2007. On the possibility of practically obfuscating programs towards a unified perspective of code protection. Journal in Computer Virology 3, 1 (2007), 3--21.Google ScholarCross Ref
- Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1--127. Google ScholarDigital Library
- Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (2007).Google Scholar
- Christopher M. Bishop. 1995. Neural networks for pattern recognition. Oxford, Clarendon Press. Google ScholarDigital Library
- Bizjournals. 2011. McAfee: Trends in a decade of cybercrime. Retrieved from http://www.bizjournals.com/sanjose/news/2011/01/25/mcafee-trends-in-a-decade-of-cybercrime.html?page=all.Google Scholar
- Kevin Borders and Atul Prakash. 2004. Web tap: Detecting covert web traffic. In Proceedings of the 11th ACM Conference on Computer and Communications Security. Google ScholarDigital Library
- Leo Breiman. 1996. Bagging predicators. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarDigital Library
- Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. 2007. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS). Google ScholarDigital Library
- Duen Horng Chau, Carey Nachenberg, Jeffrey Wilhelm, Adam Wright, and Christos Faloutsos. 2011. Polonium: Tera-scale graph mining for malware detection. In Proceedings of the SIAM International Conference on Data Mining (SDM). Google ScholarCross Ref
- Lingwei Chen, William Hardy, Yanfang Ye, and Tao Li. 2015. Analyzing file-to-file relation network in malware detection. In Proceedings of the International Conference on Web Information Systems Engineering (WISE). Google ScholarDigital Library
- Mihai Christodorescu and Somesh Jha. 2003. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th Conference on USENIX Security Symposium. Google ScholarDigital Library
- Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. 2007. Mining specifications of malicious behavior. In Proceedings of ESEC/FSE. Google ScholarDigital Library
- Mihai Christodorescu, Somesh Jha, Sanjit A. Seshia, Dawn Song, and Randal E. Bryant. 2005. Semantics-aware malware detection. In Proceedings of IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- William W. Cohen. 1995. Fast effective rule induction. In Proceedings of 12th International Conference on Machine Learning. Google ScholarDigital Library
- Peter Coogan. 2010. SpyEye Bot Versus Zeus Bot. Retrieved from http://www.symantec.com/connect/blogs/spyeye-bot-versus-zeus-bot.Google Scholar
- Thomas Cover and Peter Hart. 1967. Nearest nieghbor pattern classification. IEEE Transaction on Information Theory IT-13, 1 (1967), 21--27. Google ScholarDigital Library
- Jedidiah R. Crandall, Zhendong Su, S. Felix Wu, and Frederic T. Chong. 2005. On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits. In Proceedings of the 12th ACM Conference on Computer and Communications Security (CCS). Google ScholarDigital Library
- Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. 2004. Adversarial classification. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 99--108. Google ScholarDigital Library
- Damballa. 2008. 3% to 5% of Enterprise Assets Are Compromised by Bot-Driven Targeted Attack Malware. Retrieved from http://www.prnewswire.com/news-releases/3-to-5-of-enterprise-assets-are-compromised-by-bot-driven-targeted-attack-malware-61634867.html.Google Scholar
- Mohsen Damshenas, Ali Dehghantanha, and Ramlan Mahmoud. 2013. A survey on malware propagation, analysis, and detection. International Journal of Cyber-Security and Digital Forensics (IJCSDF) 2, 4 (2013), 10--29.Google Scholar
- Sanjeev Das, Yang Liu, Wei Zhang, and Mahintham Chandramohan. 2016. Semantics-based online malware detection: Towards efficient real-time protection against malware. IEEE Transactions on Information Forensics and Security 11, 2 (2016), 289--302. Google ScholarDigital Library
- Thomas Dietterich. 1997. Machine learning research: Four current directions. Artificial Intelligence Magzine 18, 4 (1997), 97--36.Google Scholar
- Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop on Multiple Classifier Systems. Google ScholarDigital Library
- Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: Malware analysis via hardware virtualization extensions. In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS). Google ScholarDigital Library
- Pedro Domingos and Michael Pazzani. 1997. On the optimality of simple Bayesian classifier under zero-one loss. Machine Learning 29, 2--3 (1997), 103--130. Google ScholarDigital Library
- Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR) 44, 2 (2012), 6. Google ScholarDigital Library
- Yuval Elovici, Asaf Shabtai, Robert Moskovitch, Gil Tahan, and Chanan Glezer. 2007. Applying machine learning techniques for detection of malicious code in network traffic. KI: Advances in Artificial Intelligence (2007). Google ScholarDigital Library
- EMarketer. 2014. Global B2C Ecommerce Sales to Hit $1.5 Trillion This Year Driven by Growth in Emerging Markets. Retrieved from http://www.emarketer.com/Article/Global-B2C-Ecommerce-Sales-Hit-15-Trillion-This-Year-Driven-by-Growth-Emerging-Markets/1010575.Google Scholar
- Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Gomes Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research 15, 1 (2014), 3133--3181. Google ScholarDigital Library
- Eric Filiol, Gregoire Jacob, and Mickael Le Liard. 2007. Evaluation methodology and theoretical model for antiviral behavioural detection strategies. Journal in Computer Virology 3, 1 (2007), 23--37. Google ScholarCross Ref
- Ivan Firdausi, Alva Erwin, and Anto Satriyo Nugroho. 2010. Analysis of machine learning techniques used in behavior based malware detection. In Proceedings of 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies (ACT). Google ScholarDigital Library
- Evelyn Fix and Joseph L. Hodges Jr. 1951. Discriminatory analysis-nonparametric discrimination: Consistency properties. US Air Force, School of Avaiation Medicine, Tech. Rep 4 (1951), 5--32.Google Scholar
- Matt Fredrikson, Somesh Jha, Mihai Christodorescu, Reiner Sailer, and Xifeng Yan. 2010. Synthesizing near-optimal malware specifications from suspicious behaviors. In Proceedings of IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comp. Syst. Sci. 55, 1 (1997), 119--39. Google ScholarDigital Library
- Ekta Gandotra, Divya Bansal, and Sanjeev Sofat. 2014. Malware analysis and classification: A survey. Journal of Information Security 5, 2 (2014), 56--64. Google ScholarCross Ref
- Maria Garnaeva, Victor Chebyshev, Denis Makrushin, Roman Unuchek, and Anton Ivanov. 2014. Kaspersky Security Bulletin 2014. Retrieved from http://securelist.com/analysis/kaspersky-security-bulletin/68010/kaspersky-security-bulletin-2014-overall-statistics-for-2014/.Google Scholar
- Todd R. Golub, Donna K. Slonim, Pablo Tamayo, Christine Huard, Michelle Gaasenbeek, Jill P. Mesirov, and Hilary Coller. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 5439 (1999), 531--537. Google ScholarCross Ref
- Isabelle Guyon and Andr Elisseeff. 2003. An introduction to variable and feature selection. Jouranl of Machine Learning Research 3 (March 2003), 1157--1182. Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter (2009). Google ScholarDigital Library
- William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, and Xin Li. 2016. DL4MD: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN).Google Scholar
- Olivier Henchiri and Nathalie Japkowicz. 2006a. A feature selection and evaluation scheme for computer virus detection. In Proceedings of the 6th International Conference on Data Mining. Google ScholarDigital Library
- Olivier Henchiri and Nathalie Japkowicz. 2006b. A feature selection and evaluation scheme for computer virus detection. In Proceedings of ICDM. Google ScholarDigital Library
- Shif Hou, Aaron Saas, Yanfang Ye, and Lifei Chen. 2016. DroidDelver: An android malware detection system using deep belief network based on API call blocks. In Proceedings of the International Conference on Web-Age Information Management. 54--66. Google ScholarCross Ref
- Xin Hu. 2011. Large-scale malware analysis, detection, and signature generation. Ph.D. Dissertation, Department of Computer Science and Engineering, University of Michigan. Google ScholarDigital Library
- Galen Hunt and Doug Brubacher. 1998. Detours: Binary interception of win32 functions. In Proceedings of the 3rd USENIX Windows NT Symposium. Google ScholarDigital Library
- IDAPro. 2016. The Interactive Disassembler. Retrieved from https://www.hex-rays.com/products/ida/support/download_freeware.shtml.Google Scholar
- Nwokedi Idika and Aditya P. Mathur. 2007. A survey of malware detection techniques. Research Report in Purdue University (2007).Google Scholar
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbor: Towards removing the curse of dimensionality. In Proceedings of 30th Annual ACM Symposium on Theory of Computing. Google ScholarDigital Library
- Virtualization Technology Intel. 2013. Retrieved from http://www.intel.com/technology/virtualization.Google Scholar
- Rafiqul Islam, Ronghua Tian, Lynn M. Batten, and Steve Versteeg. 2013. Classification of malware based on integrated static and dynamic features. Journal of Network and Computer Application 36, 2 (2013), 646--656. Google ScholarDigital Library
- ITU. 2014. ITU releases 2014 ICT figures. Retrieved from https://www.itu.int/net/pressoffice/press_releases/2014/23.aspx.Google Scholar
- Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (2000), 4--37. Google ScholarDigital Library
- Xuxian Jiang, Dongyan Xu, Helen Wang, and Eugene Spafford. 2005. Virtual playgrounds for worm behavior investigation. In Proceedings of the 8th International Symposium on Recent Advances in Intrusion Detection. Google ScholarDigital Library
- Thorsten Joachims. 1998. Making large-scale support vector machine learning practical. Advances in Kernel Methods: Support Vector Machines (1998). Google ScholarDigital Library
- George H. John and Pat Langley. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- Min Gyung Kang, Pongsin Poosankam, and Heng Yin. 2007. Renovo: A hidden code extractor for packed executables. In Proceedings of the 5th ACM Workshop on Recurring Malcode (WORM). Google ScholarDigital Library
- Chris Kanich, Christian Kreibich, Kirill Levchenko, Brandon Enright, Vern Paxson, Geoffrey M. Voelker, and Stefan Savage. 2008. Spamalytics: An empirical analysis of spam marketing conversion. In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS). Google ScholarDigital Library
- Nikos Karampatziakis, Jack W. Stokes, Anil Thomas, and Mady Marinescu. 2013. Using file relationships in malware classification. In Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment. Google ScholarDigital Library
- Md Enamul Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. 2005. Malware phylogeny generation using permutations of code. Journal in Computer Virology 1, 1--2 (2005), 13--23.Google ScholarCross Ref
- Kaspersky. 2015. The Great Bank Robbery. Retrieved from http://www.kaspersky.com/about/news/virus/2015/Carbanak-cybergang-steals-1-bn-USDfrom-100-financial-institutions-worldwide.Google Scholar
- Kris Kendall and Chad McMillan. 2007. Practical Malware Analysis. Retrieved from https://www.blackhat.com/presentations/bh-dc-07/Kendall_McMillan/Presentation/bh-dc-07-Kendall_McMillan.pdf.Google Scholar
- Kingsoft. 2014. 2013-2014 Internet Security Report in China. Retrieved from http://www.ijinshan.com/news/2014011401.shtml.Google Scholar
- Kingsoft. 2015. 2014-2015 Internet Security Research Report in China. Retrieved from http://www.cssn.cn/xwcbx/xwcbx_gcsy/201501/P020150122566733317860.pdf.Google Scholar
- Kingsoft. 2016. 2015-2016 Internet Security Research Report in China. Retrieved from http://cn.cmcm.com/news/media/2016-01-14/60.html.Google Scholar
- Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFengWang. 2009. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium. Google ScholarDigital Library
- Jeremy Z. Kolter and Marcus A. Maloof. 2004. Learning to detect malicious executables in the wild. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- J. Zico Kolter and Marcus A. Maloof. 2006. Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7 (Dec. 2006), 2721--2744. Google ScholarDigital Library
- Nojun Kwak and Chong-Ho Choi. 2002. Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24, 12 (2002), 1667--1671. Google ScholarDigital Library
- Pat Langley. 1994. Selection of relevant features in machine learning. In Proceedings of AAAI Fall Symposium. Google ScholarCross Ref
- Andrea Lanzi, Monirul Sharif, and Wenke Lee. 2009. K-Tracer: A system for extracting kernel malware behavior. In Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS).Google Scholar
- Tony Lee and Jigar J. Mody. 2006. Behavioral classification. In Proceedings of the European Institute for Computer Antivirus Research Conference (EICAR).Google Scholar
- David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, Inc., 3--12. Google ScholarDigital Library
- Shengqiao Li, E. James Harner, and Donald A. Adjeroh. 2011. Random KNN feature selection - A fast and stable alternative to random forests. BMC Bioinformatics 12, 1 (2011), 450.Google ScholarCross Ref
- Shengqiao Li, E. James Harner, and Donald A. Adjeroh. 2014. Random KNN. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshops. 629--636. Google ScholarCross Ref
- Tao Li (Ed.). 2015. Event Mining: Algorithms and Applications. CRC Press. Google ScholarDigital Library
- LordPE. 2013. PE Tools - LordPE. Retrieved from http://www.malware-analyzer.com/pe-tools.Google Scholar
- Mike Loukides and Andy Oram. 1996. Getting to know gdb. Linux Journal (1996). Google ScholarDigital Library
- James Lyne. 2014. Security threat trends 2015. Retrieved from https://www.sophos.com/threat-center/medialibrary/PDFs/other/sophos-trends-and-predictions-2015.pdf.Google Scholar
- Mohammad M. Masud, Tahseen Al-Khateeb, Kevin W. Hamlen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani M. Thuraisingham. 2011. Cloud-based malware detection for evolving data streams. ACM Trans. Management Inf. Syst. 2, 3 (2011), 16. Google ScholarDigital Library
- Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2008. Mining concept-drifting data stream to detect peer to peer botnet traffic. Tech. rep. UTDCS-05-08, The University of Texas at Dallas, Richardson (2008).Google Scholar
- Mohammad M. Masud, Latifur Khan, and Bhavani Thuraisingham. 2007. A scalable multi-level feature extraction technique to detect malicious executables. Information Systems Frontiers 10, 1 (2007), 33--45. Google ScholarDigital Library
- Kirti Mathur and Saroj Hiranwal. 2013. A survey on techniques in detection and analyzing malware executables. International Journal of Advanced Research in Computer Science and Software Engineering 3, 4 (2013), 422--428.Google Scholar
- Micropoint. 2008. Micropoint Antivirus. Retrieved from http://www.micropoint.com.cn/Channel//20080626114608.html.Google Scholar
- David Moore and Colleen Shannon. 2002. Code-red: A case study on the spread and victims of an internet worm. In Proceedings of the Internet Measurement Workshop. Google ScholarDigital Library
- Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of static analysis for malware detection. In Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC). Google ScholarCross Ref
- Robert Moskovitch, Clint Feher, and Yuval Elovici. 2009. A chronological evaluation of unknown malcode detection. LNCS: Intelligence and Security Informatics 5477 (2009), 112--117. Google ScholarDigital Library
- Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, and Yuval Elovici. 2008a. Unknown malcode detection using OPCODE representation. In Proceedings of the European Conference on Intelligence and Security Informatics (EuroISI). Google ScholarDigital Library
- Robert Moskovitch, Nir Nissim, and Yuval Elovici. 2008b. Acquisition of malicious code using active learning. In PinKDD.Google Scholar
- Robert Moskovitch, Dima Stopel, Clint Feher, Nir Nissim, and Yuval Elovici. 2008c. Unknown malcode detection via text categorization and the imbalance problem. In IEEE Intelligence and Security Informatics. Google ScholarDigital Library
- Kevin P. Murphy. 2012. Machine learning: A probabilistic perspective. In The MIT Press, Cambridge, Massachusetts. Google ScholarDigital Library
- Ion Muslea, Steven Minton, and Craig A. Knoblock. 2006. Active learning with multiple views. Journal of Artificial Intelligence Research 27 (2006), 203--233. Google ScholarCross Ref
- Carey Nachenberg and Vijay Seshadri. 2010. An analysis of real-world effectiveness of reputation-based security. In Proceedings of the Virus Bulletin Conference (VB).Google Scholar
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning. ACM, 79. Google ScholarDigital Library
- Ming Ni, Tao Li, Qianmu Li, Hong Zhang, and Yanfang Ye. 2016. FindMal: A file-to-file social network based malware detection framework. Knowledge-Based Systems 112 (2016), 142--151. Google ScholarDigital Library
- Corporation of Compuware. 1999. Debugging blue screens. Technical Paper (September 1999).Google Scholar
- Gunter Ollmann. 2010. Serial variant evasion tactics techniques used to automatically bypass antivirus technologies. Retrieved from http://www.damballa.com/downloads/rpubs/WPSerialVariantEvasionTactics.pdf.Google Scholar
- OllyDump. 2006. PE Tools - OllyDump. Retrieved from http://www.openrce.org/downloads/details/108/OllyDump.Google Scholar
- David Orenstein. 2000. Application programming interface (API). In Quick Study: Application Programming Interface (API).Google Scholar
- Nikunj C. Oza and Stuart Russell. 2001. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of SIGKDD. Google ScholarDigital Library
- Judea Pearl. 1987. Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence 32, 2 (1987), 245--258. Google ScholarDigital Library
- Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226--1238. Google ScholarDigital Library
- Simon Perkins, Kevin Lacker, and James Theiler. 2003. Grafting: Fast incremental feature selection by gradient descent in function space. JMLR 3 (March 2003), 1333--1356. Google ScholarDigital Library
- Qemu. 2016. (2016). http://www.qemu-project.org/index.html.Google Scholar
- Internet Security Center Qihoo. 2015. 2014 Internet Security Research Report in China. Retrieved from http://zt.360.cn/report/.Google Scholar
- J. Ross Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81--106. Google ScholarCross Ref
- J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers, Inc. (1993). Google ScholarDigital Library
- Alain Rakotomamonjy. 2003. Variable selection using SVM-based criteria. JMLR 3 (March 2003), 1357--1370. Google ScholarDigital Library
- Zulfikar Ramzan, Vijay Seshadri, and Carey Nachenberg. 2013. Reputation-based security: An analysis of real world effectiveness. In Symantec Security Response.Google Scholar
- Rizwan Rehmani, G. C. Hazarika, and Gunadeep Chetia. 2011. Malware threats and mitigation strategies: A Survey. Journal of Theoretical and Applied Information Technology 29, 2 (2011), 69--73.Google Scholar
- John Robbins. 1999. Debugging windows based applications using windbg. Microsoft Systems Journal (1999).Google Scholar
- Lior Rokach. 2010. Ensemble-based classifiers. Artif Intell Rev 33, 1 (2010), 1--39. Google ScholarDigital Library
- Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, and Wenke Lee. 2006. PolyUnpack: Automating the hidden-code extraction of unpack-executing malware. In Proceedings of the 22nd Annual Computer Security Applications Conference. Google ScholarDigital Library
- Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517. Google ScholarDigital Library
- Gerard Salton, Anita Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620. Google ScholarDigital Library
- Igor Santos, Jaime Devesa, Felix Brezo, Javier Nieves, and Pablo Garcia Bringas. 2013. OPEM: A static-dynamic approach for machine learning based malware detection. In Proceedings of International Conference CISIS-ICEUTE, Special Sessions Advances in Intelligent Systems and Computing. Google ScholarCross Ref
- Igor Santos, Carlos Laorden, and Pablo G. Bringas. 2011a. Collective classification for unknown malware detection. In Proceedings of the International Conference on Security and Cryptography.Google Scholar
- Igor Santos, Javier Nieves, and Pablo G. Bringas. 2011b. Semi-supervised learning for unknown malware detection. In International Symposium on Distributed Computing and Artificial Intelligence Advances in Intelligent and Soft Computing. Google ScholarCross Ref
- Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Proceedings of the 10th International Conference on Malicious and Unwanted Software (MALWARE). Google ScholarDigital Library
- Matthew G. Schultz, Eleazar Eskin, F. Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proc. of the IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Fabrizio Sebastiani. 2002. Text categorization. Comput. Surveys 34, 1 (2002), 1--47. Google ScholarDigital Library
- H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the 5th Annual Workshop on Computational Learning Theory. ACM, 287--294. Google ScholarDigital Library
- Asaf Shabtai, Robert Moskovitch, Yuval Elovici, and Chanan Glezer. 2009. Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey. Information Security Technical Report 14, 1 (2009), 16--29. Google Scholar
- Muazzam Siddiqui, Morgan C. Wang, and Joohan Lee. 2008. A survey of data mining techniques for malware detection using file features. In Proceedings of ACM-SE. Google ScholarDigital Library
- Muazzam Siddiqui, Morgan C. Wang, and Joohan Lee. 2009. Detecting internet worms using data mining techniques. Journal of Systemics, Cybernetics and Informatics 6, 6 (2009), 48--53.Google Scholar
- Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. 2008. BitBlaze: A new approach to computer security via binary analysis. In Proceedings of the 4th International Conference on Information Systems Security. Google ScholarDigital Library
- Eugene H. Spafford. 1989. The internet worm incident. In Proceedings of the 2nd European Software Engineering Conference. Google ScholarDigital Library
- Elizabeth Stinson and John C. Mitchell. 2007. Characterizing bots’ remote control behavior. LNCS: Detection of Intrusions and Malware, and Vulnerability Assessment 4579 (2007), 89--108. Google ScholarDigital Library
- Jack W. Stokes, John C. Platt, Helen J. Wang, Joe Faulhaber, Jonathan Keller, Mady Marinescu, Anil Thomas, and Marius Gheorghescu. 2012. Scalable telemetry classification for automated malware detection. Computer Security - ESORICS (2012).Google Scholar
- Andrew H. Sung, Jianyun Xu, Patrick Chavez, and Srinivas Mukkamala. 2004. Static analyzer of vicious executables (SAVE). In Proceedings of the 20th Annual Computer Security Applications Conference. Google ScholarDigital Library
- Symantec. 2008. Symantec global internet security threat report. Retrieved from http://eval.symantec.com/mktginfo/enterprise/white_papers/b-whitepaper_internet_security_threat_report_xiii_04-2008.en-us.pdf.Google Scholar
- Symantec. 2014a. Internet Security Threat Report 2014. Retrieved from http://www.symantec.com/security_response/publications/threatreport.jsp.Google Scholar
- Symantec. 2014b. The Threat Landscape in 2014 and Beyond: Symantec and Norton Predictions for 2015, Asia Pacific and Japan. Retrieved from http://www.symantec.com/connect/blogs/threat-landscape-2014-and-beyond-symantec-and-norton-predictions-2015-asia-pacific-japan.Google Scholar
- Symantec. 2016. Internet Security Threat Report. Retrieved from https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf.Google Scholar
- Kimberly Tam, Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, and Lorenzo Cavallaro. 2017. The evolution of android malware and android analysis techniques. ACM Computing Surveys (CSUR) 49, 4 (2017), 76. Google ScholarDigital Library
- Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. 2014. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proccedings of ACM International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD). Google ScholarDigital Library
- Fadi Abdeljaber Thabtah. 2007. A review of associative classification mining. Knowledge Engineering Review 22, 1 (2007), 37--65. Google ScholarDigital Library
- Ronghua Tian, Rafiqul Islam, Lynn Batten, and Steve Versteeg. 2010. Differentiating malware from cleanwares using behavioral analysis. In Proceedings of 5th International Conference on Malicious and Unwanted Software (Malware).Google Scholar
- TrendLabs. 2014. The invisible becomes visible: Trend micro security predictions for 2015 and beyond. (2014). http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/reports/rpt-the-invisible-becomes-visible.pdf.Google Scholar
- Trend Threat Research Team TrendMicro. 2010. Zeus: A Persistent Criminal Enterprise. Retrieved from http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp_zeuspersistent-criminal-enterprise.pdf.Google Scholar
- Amit Vasudevan and Ramesh Yerraballi. 2005. Stealth breakpoints. In Proceedings of the 21st Annual Computer Security Applications Conference. Google ScholarDigital Library
- Amit Vasudevan and Ramesh Yerraballi. 2006. Cobra: Fine-grained malware analysis using stealth localized-executions. In Proceedings of 2006 IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- Shobha Venkataraman, Avrim Blum, and Dawn Song. 2008. Limits of learning-based signature generation with adversaries. In NDSS.Google Scholar
- Andrei Venzhega, Polina Zhinalieva, and Nikolay Suboch. 2013. Graph-based malware distributors detection. In Proceedings of the 22nd International Conference on World Wide Web Companion (WWW). Google ScholarDigital Library
- Randall Wald, Taghi M. Khoshgoftaar, and Amri Napolitano. 2013. Comparison of stability for different families of filter-based and wrapper-based feature selection. In ICMLA. Google ScholarDigital Library
- Tzu-Yen Wang, Shi-Jinn Horng, Ming-Yang Su, Chin-Hsiung Wu, Peng-Chu Wang, and Wei-Zen Su. 2006b. A surveillance spyware detection system based on data mining methods. Evolutionary Computation (2006), 3236--3241.Google Scholar
- Yi-Min Wang, Doug Beck, Xuxian Jiang, and Roussi Roussev. 2006a. Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In NDSS.Google Scholar
- SECURITY LABS WEBSENSE. 2014. 2015 Security Predictions. Retrieved from http://www.websense.com/assets/reports/report-2015-security-predictions-en.pdf.Google Scholar
- Paul Werbos. 1974. Beyond regression: New tools for prediction and analysis in the behavioral science. Ph.D. Dissertation, Harvard University.Google Scholar
- Wikipedia. 2016. Scareware. Retrieved from https://en.wikipedia.org/wiki/Scareware.Google Scholar
- Wikipedia. 2017a. Assembly Language. Retrieved from http://en.wikipedia.org/wiki/Assembly_language.Google Scholar
- Wikipedia. 2017b. Computer Virus. Retrieved from http://en.wikipedia.org/wiki/Computer_virus.Google Scholar
- Wikipedia. 2017c. Morris Worm. Retrieved from http://en.wikipedia.org/wiki/Morris_worm.Google Scholar
- Wikipedia. 2017d. Ransomware. Retrieved from https://en.wikipedia.org/wiki/Ransomware.Google Scholar
- Wikipedia. 2017e. Rootkit. Retrieved from http://en.wikipedia.org/wiki/Rootkit.Google Scholar
- Wikipedia. 2017f. Zero-day (computing). Retrieved from https://en.wikipedia.org/wiki/Zero-day_(computing).Google Scholar
- Wikipedia. 2017g. Zeus (malware). Retrieved from http://en.wikipedia.org/wiki/Zeus_(malware).Google Scholar
- Carsten Willems, Thorsten Holz, and Felix Freiling. 2007. Toward automated dynamic malware analysis using cwsandbox. In IEEE Security and Privacy. Google ScholarDigital Library
- Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. In IEEE Transactions on Neural Networks 16, 3 (2005), 645--678. Google ScholarDigital Library
- Yanfang Ye. 2010. Research on intelligent malware detection methods and their applications. Ph.D. Dissertation, Department of Computer Science, Xiamen University (2010).Google Scholar
- Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, and Min Zhao. 2009. SBMDS: An interpretable string based malware detection system using SVM ensemble with bagging. Journal in Computer Virology 5, 4 (2009), 283--293. Google ScholarCross Ref
- Yanfang Ye, Tao Li, Yong Chen, and Qingshan Jiang. 2010. Automatic malware categorization using cluster ensemble. In Proccedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Yanfang Ye, Tao Li, Kai Huang, Qingshan Jiang, and Yong Chen. 2009a. Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. Journal of Intelligent Information Systems 35, 1 (2009), 1--20. Google ScholarDigital Library
- Yanfang Ye, Tao Li, Qingshan Jiang, Zhixue Han, and Li Wan. 2009c. Intelligent file scoring system for malware detection from the gray list. In Proccedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Yanfang Ye, Tao Li, Qingshan Jiang, and Youyu Wang. 2009b. CIMDS: Adapting post-processing techniques of associative classification for malware detection system. IEEE Transactions on Systems, Man, and Cybernetics 40, 3 (2009), 298--307. Google ScholarDigital Library
- Yanfang Ye, Tao Li, Shenghuo Zhu, Weiwei Zhuang, Egemen Tas, Umesh Gupta, and Melih Abdulhayoglu. 2011. Combining file content and file relations for cloud based malware detection. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Yanfang Ye, Dingding Wang, Tao Li, and Dongyi Ye. 2007. IMDS: Intelligent malware detection system. In Proccedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye, and Qingshan Jiang. 2008. An intelligent PE-malware detection system based on association mining. Journal in Computer Virology 4, 4 (2008), 323--334. Google ScholarCross Ref
- Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2001. Understanding belief propagation and its generalizations. In Mitsubishi Electric Research Laboratories.Google Scholar
- Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. 2007. Panorama: Capturing system-wide information flow for malware detection and analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS). Google ScholarDigital Library
- Chunqiu Zeng, Liang Tang, Wubai Zhou, Tao Li, Larisa Shwartz, and Genady Ya.Grabarnik. 2017. An integrated framework for mining temporal logs from fluctuating events. IEEE Transactions on Services Computing (TSC) (2017). In Press.Google Scholar
- Boyun Zhang, Jianping Yin, Jingbo Hao, Dingxing Zhang, and Shulin Wang. 2007. Malicious codes detection based on ensemble learning. Autonomic and Trusted Computing (2007). Google ScholarDigital Library
- Jianwei Zhuge, Thorsten Holz, Chengyu Song, Jinpeng Guo, Xinhui Han, and Wei Zou. 2008. Studying malicious websites and the underground economy on the Chinese web. In Proceedings of the 7th Workshop on Economics of Information Security.Google Scholar
Index Terms
- A Survey on Malware Detection Using Data Mining Techniques
Recommendations
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
A state-of-the-art survey of malware detection approaches using data mining techniques
Data mining techniques have been concentrated for malware detection in the recent decade. The battle between security analyzers and malware scholars is everlasting as innovation grows. The proposed methodologies are not adequate while evolutionary and ...
Comments