

File carving is the process which aims to recover files from storage media without the file system meta-data. The ability to perform such recovery is particularly important in this digital era when it involves forensic investigation. Due to the inevitable occurrence of file fragmentation in storage system, fragment classification is an important step in the file recovery process. Following the increase of storage capacity and usage of mobile phones, large amount of personal data tends to be stored on such devices, which is of great interest for forensic analysis during investigations. In this paper, we present an approach in classifying the most commonly found fragment types on mobile phones, which include JPG, MP3, MP4, MOV and SQLite. Departing from the conventional approaches that utilize analysis derived from unigram statistics, we employ bigram statistics in our approach in order to capture the frequency of local byte order which retains meaningful and exploitable pattern in the fragments. While being able to capture more information, the bigram statistics also contain a large amount of redundant data which greatly increases the computational workload. Therefore, we perform dimensionality reduction through Principal Component Analysis (PCA) in order to extract only the most significant dimensions for classification purpose of the targeted file types. Using the resulting features, an average classification accuracy of 96.19% is achieved, comparing to 88.40% while using the unigram statistics alone through Support Vector Machine (SVM).