51
4
StAtIc AnAlySIS
Identifying if a suspect le is malicious typically begins with static analy-
sis. Static analysis does not involve running the code or opening a le
(dynamic analysis), or reverse engineering of the code via disassembly or
debugging. Static analysis largely involves identifying and querying cryp-
tographic hash values, such as MD5, strings, and metadata. More impor-
tant, static analysis is part of a larger process that is recursive by nature,
such as extracting class les from a hostile APK and then collecting static
data on individual artifacts, looking at static analysis of related APKs,
and so on as an analyst seeks to establish more context and analytical
relationships for evaluative authority in understanding a threat.
Static analysis is the most exible part of Android malware analysis
as it can be performed from a multitude of operating systems rather
than being dependent upon the Android operating system. Many
analysts prefer to develop a set of tools and scripts within a Linux
environment, such as Ubuntu, because of the security provided by the
operating system, native solutions for script (Python, Perl, Bash), and
wide variety of tools that can easily be used in such an environment
for ecient static analysis of malware.
e process of static analysis of Android malware is the same
as that of traditional Windows, Linux, or other types of malware.
What does dier for Android threats is how APKs are packaged and
compiled compared to that of a Windows binary. Windows binaries
are compiled as executables with an MZ header. Android apps are
compiled as an APK that can be unpacked into separate les includ-
ing the source code, a manifest, and other les common to an APK
le. Analysts familiar with static analysis of other malware types
will quickly adapt to performing static analysis of Android malware.
Of note for more experienced readers is that static analysis can and
should be automated, such as a Python script or tool to generate hash
data for multiple les.
52
android Malware and analysis
is chapter approaches static analysis through the following hier-
archy of topics: collections, le types, cryptographic hashes, meta-
data, visualization, and automation. Readers should remember that
static analysis is a process requiring an analyst to regularly perform
static analysis on new artifacts and discoveries as one performs in-
depth Android malware analysis. Android malware analysis likely
falls within another process, incident response, which involves several
of its own steps and phases as one responds to an event or incident.
Collections: Where to Find Apps for Analysis
e ability to nd code to research can be challenging for an analyst
new to Android malware analysis. Fortunately, there are several loca-
tions where collections for such samples may be acquired. Additionally,
advanced researchers regularly script automated methods for identify-
ing, downloading, and triaging possible new app threats that may lead
to new discoveries of Android malware in the wild.
Google Play Marketplace
Google Play is the ocial marketplace for Android apps. e app
itself is called Google Play on devices, pointing to the aforementioned
Web site (https://play.google.com/store). Users may easily download
any app of interest from the site, with some being free and others
commercially developed apps. However, permissions through Google
Play do vary based on feature and geolocation, such as TV shows
only being available for a small number of countries. All countries
enable purchasing of apps through Google Play but select coun-
tries are supported for developers (merchants) being able to sell apps
through the marketplace (https://support.google.com/googleplay/
android-developer/table3539140?rd=1).
In the early days, rogue developer accounts were used to distribute
hostile apps through the ocial marketplace, such as the infamous
DroidDream with at least three rogue accounts and dozens of hostile
apps, which spread to the marketplace in 2011. Improved security con-
trols followed such events, with fraudsters now hijacking compromised
developer accounts or spreading code through other means, such as unof-
cial “cracked” sites, distributing popular apps of interest to consumers.
53
statiC analysis
Marketplace Mirrors and Cache
Multiple Web sites exist that mirror or host a large quantity of Android
apps of interest. For example, androidpolice.com and appbrain.com
are two such Web sites with a lot of Android content including apps.
In some cases, a new threat on the Google Play marketplace emerges
and is then mitigated by Google, but is still available on mirror and
third-party Web sites hosting the original content. Sometimes search-
ing through cache queries via a search engine may also reveal addi-
tional metadata, a download, or a download of interest for obtaining
a specic sample or hash value.
Contagio Mobile
http://contagiominidump.blogspot.com/. Mila Parkour maintains one
of the most popular and updated blogs on the Internet providing both
samples and links to analysis for each sample. Parkour uses a propri-
etary password system but oers it to individuals that ask her for the
information to decrypt downloads from her Web site. Scrolling down
the page on the right-hand side oers a long list of samples organized
by family name, such as opfake, Plankton, Stel, and others.
Advanced Internet Queries
Advanced queries, adding unique keywords, combinations of key-
words, and advanced operators provided by search engines like Google
can yield an amazing amount of information for an Android malware
analyst. As an example, locate new samples on VirusTotal by search-
ing for Android or Android.Trojan or similar terms combined with the
inurl:virustotal.com advanced search operator limiting results to just
those that contain the string virustotal.com (or whatever site you want
to specically search). If looking for a family name, such as Moghava,
perform a similar query, such as inurl:virustotal.com moghava.
Private Groups and Rampart Research Inc.
Multiple groups exist for sharing mobile data, some of which are pri-
vate. e best way to get into such groups is to become active in the
54
android Malware and analysis
industry, analyzing new threats as they emerge, and publishing infor-
mation on a blog or public mailing lists. Over time, an individual may
present at a conference, write articles, and become further involved in
the industry leading to invitations into private mailing groups. In the
end it is all about networking to get to know and trust other individuals
within the industry. Rampart Research (http://rampartresearch.org) is
a nonprot founded by one of the authors (Dunham) of this book,
dedicated to promoting individual growth and networking within the
global cyber-response industry. Rampart Research maintains millions
of malware samples, manages private discussion groups, and more
with a specialty research group dedicated to mobile malware.
Android Malware Genome Project
http://www.malgenomeproject.org/policy.html. Dr. Xuxian Jiang and
Yajin Zhou oer up about 1,200 samples used in educational research
from a research project published in 2012. To obtain such samples one
must meet policy requirements stated in the provided link.
File Data
Looking at just an Android app there are several common le data
points that one may immediately collect: lename, size, created, mod-
ied, and accessed times, and le type. A lename, like bad.apk, may
be useful later when looking for similar samples that may have unique
names or variants that may exist on other devices when handling an
incident investigation. e more unique a lename the more useful
it may become when performing correlation or searches for similar
threats or associated threat data. File size can also help narrow a
search if one or more APKs are identied as a specic size or within
a range of likely sizes. For example, one may search a commercial ser-
vice such as VirusTotal for samples by name and size to identify other
samples that may be or are directly related.
Dates and times associated with the le may also be useful in cor-
relating a threat. For example, an incident may involve threats that
emerged on or around a specic date. In some situations searching
for threats of a certain type, such as APK/apps on devices, matching
modied, accessed, or created (MAC) times may help discover other
55
statiC analysis
related threats installed in an attack. MAC times may also help paint
a picture of a campaign of codes, where variants are released over a
multimonth period showing development and deployment into the
wild over time.
File type is a type of content inspection, where the original le-
name bad.apk may be misleading. Sometimes les are not what they
claim to be, such as a le claiming to be of a dierent extension but
it is actually something dierent. For example, in the Windows mal-
ware world a BMP extension may actually be an executable masquer-
ading as an image le as a method of attempting to bypass detection
by simple IDS/IPS or incident response and forensic investigation
looking for an EXE or similar extension of concern. Using the FILE
command in Linux is a fast and easy way to identify the le type
regardless of the extension used by the le. Below is an example of
how to use the FILE command:
$ file abc.apk
abc.apk: Zip archive data, at least v2.0 to extract
APK les should be identied as a ZIP archive. It is a common
challenge in the security industry to get a variety of mobile malware
samples that are actually a mixture of APK les, DEX source code
les, class les, and various other artifacts. Performing a triage with
basic le information, including the FILE command, greatly assists
in proper threat classication and approach before diving deeper into
the analysis of a le of interest.
Cryptographic Hash Types and Queries
Cryptographic hash values are an algorithm used to generate a check-
sum or “hash.” Common types are MD5, SHA1, and SHA256. ere
are many types of cryptographic hashes but these are the ones that are
most commonly implemented and used by others in the security indus-
try. e academic subject of cryptographic hashes is complex, and there
are real world challenges with every type. For example, some values
have longer string checksums than others, which to scale when involv-
ing millions of samples is very expensive to store, search, and return
search results compared to smaller checksum values. Additionally,
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.247.196