Reverse engineering

Malware Analysis: Classifying with ClamAV and YARA

Mourad Ben Lakhoua
April 7, 2011 by
Mourad Ben Lakhoua

On a daily basis,we are encountering thousands of new types of malware with unknown content. This malware can come from honeypots, infected websites or even be submitted by users.Analyzing all these binaries will take any malware analyst a long time. That’s why it’s critical to have an automated way to classify different types of malicious code.

Open source tools like ClamAV and YARA we can tell us if an unknown file has already been classified as malicious. If we have a fresh database with the latest signatures, we will not spend time analyzing binaries other researchers have already identified. That lets us spend our time analyzing other new or unique types of malware.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Installing ClamAV:

ClamAV is an open source (GPL) anti-virus toolkit, the AV tasks are handled by three processes:

  • freshclam automatically update virus definitions by connecting to http://www.clamav.net/mirrors.html— the configuration file is located under/etc/freshclam.conf
  • clamd is a multi-threaded antivirus daemon — the configuration file is located in /etc/clamd.conf
  • clamscan a command line antivirus scanner.

We need to install the latest release of ClamAV or we will have a warning message about a reduced functionality and this mean that you may not be able to use all the available virus signatures.

The most recent version of ClamAV is available from http://www.clamav.net/download/sources/. But you can also use a package manager to install it. OnaUbuntu machine, type the following commands:


$ sudo apt-get install clamav clamav-freshclam

First you can start by updating ClamAV signatures:


$ sudo freshclam


Then you run a scan on any suspicious file to check if it is infected or not:


$ sudo Clamscan

Scanning a folder with infected files

After analyzing the folders there are already infected files such as Trojan proxies that allow malicious users to control the victimized machine and use it as a proxy for spamming other people or perform any number of other malicious activities from their remote computer.

Installing YARA:

YARA is an extremely flexible identification and classification engine written by Victor Manuel Alvarez of Hipasec Sistemas. It runs on Windows, Linux and Mac OS X, and can be used through its command-line interface or from your own Python scripts with the yara-python extension.

YARA rules are easy to write and understand. They have a syntax that resembles a C struct declaration. However creating thousands of rules takes a lot of time and effort. That’s why it makes more sense to use ClamAV signatures. Usually ClamAV signatures can be found under /usr/local/share/clamav or /usr/lib/clamav on Linux systems. This is where you will find the main.cld and daily.cld. Alternately, they may have .cvd extensions, main.cld file contains the primary base of signatures and daily.cld contains incremental daily updates.

To Install YARA on Ubuntu we need the PCRE and some libraries first:


$ sudo apt-get install libpcre3 libpcre3-dev


Then we start downloading the YARA source code:


$ wget http://yara-project.googlecode.com/files/yara-1.4.tar.gz

$ wget http://yara-project.googlecode.com/files/yara-python-1.4.tar.gz


Untar and configure YARA.


$ tar xvfz yara-1.4.tar.gz

$ cd yara-1.4

$ ./configure


If there are no errors, make the executables:


$ make

$ make check

$ sudo make install


Now we add python support :


$ cd ..

$ tar xvfz yara-python-1.4.tar.gz

$ cd yara-python-1.4.tar.gz

$ python setup.py build

$ sudo python setup.py install


If you have no problem you will be able to run YARA:


$ yara -v

Checking YARA Version

You can next see all the YARA options:

Checking YARA options

The clamav_to_yara.py script by Matthew Richard can help in converting ClamAV signatures to meet the requirements of YARA. To convert you run the following commnd:


$ python clamav_to_yara.py -f main.ndb -o clamav.yara


Converting ClamAV Signatures to YARA

To scan a folder that contains suspicious files with the new clamav.yara rules, you run the following:


$ yara -r clamav.yara /data/malcode


Next you can check the clamav.yara file and you should find the rules created according to YARA format.

YARA Rules Created

Now it is important to note that many modern malwares are using obfuscation to hide their presence on the system this include coding, encryption and packing. Using YARA with the previous signature will not identify packers, to handle packers you need to add PEiD which is a GUI tool that detect them. The YARA project’s wiki2 provides a handful of sample packer rules based on the PEiD database.

Here are some rules for detecting packers based on PEiD signatures you can add them directly to the converted YARA Rules:

http://code.google.com/p/yara-project/wiki/PackerRules


//

// This rules are based on PEiD signatures (http://www.peid.info/BobSoft/Downloads/UserDB.zip)

//

rule ASPack

{

strings:

$ = { 60 E8 ?? ?? ?? ?? 5D 81 ED ?? ?? (43 | 44) ??B8 ?? ?? (43 | 44) ?? 03 C5 }

$ = { 60 EB ?? 5D EB ?? FF ?? ?? ?? ?? ?? E9 }

$ = { 60 EB 03 5D FF E5 E8 F8 FF FFFF 81 ED 1B 6A 44 00 BB 10 6A 44 00 03 DD 2B 9D 2A }

$ = { 60 E8 00 00 00 00 5D ?? ?? ?? ?? ?? ?? BB ?? ?? ?? ?? 03 DD }

$ = { 60 E8 41 06 00 00 EB 41 }

$ = { 60 E8 7? 05 00 00 EB (33 | 4C) }

$ = { 60 E8 02 00 00 00 EB 09 5D 55 }

$ = { 60 E8 03 00 00 00 E9 EB 04 5D 45 55 C3 E8 01 }

$ = { A8 03 ?? ?? 61 75 08 B8 01 ?? ?? ?? C2 0C ??68 ?? ?? ?? ?? C3 8B 85 26 04 ?? ?? 8D 8D 3B 04 ?? ?? 51 50 FF 95 }

condition:

for any of them : ($ at entrypoint)
}

rule Armadillo

{

strings:

$ = { 83 7C 24 08 01 75 05 E8 DE 4B 00 00 FF 74 24 04 8B 4C 24 10 8B 54 24 0C E8 ED FE FF FF 59 C2 0C 00 6A 0C 68 ?? ?? ?? ?? E8 }

$ = { E8 ?? ?? 00 00 E9 16 FE FF FF 6A 0C 68 ?? ?? ?? ?? E8 ?? ?? 00 00 8B 4D 08 33 FF 3B CF 76 2E 6A E0 58 33 D2 F7 F1 3B 45 0C }

$ = { E8 ?? ?? 00 00 E9 16 FE FF FF 6A 0C 68 ?? ?? ?? ?? E8 ?? ?? 00 00 83 65 E4 00 8B 75 08 3B 35 ?? ?? ?? ?? 77 22 6A 04 E8 ?? ?? 00 00 }

$ = { 55 8B EC 53 8B 5D 08 56 8B 75 0C 57 8B 7D 10 85 F6 }

$ = { 55 8B EC 6A FF 68 ?? ?? ?? ?? (68 | E0 | B0 | 40) ?? ?? ?? ?? (64 A1 | 68) }

$ = { 6A ?? 8B B5 ?? ?? ?? ?? C1 E6 04 8B 85 ?? ?? ?? ?? 25 07 ?? ?? 80 79 05 48 83 C8 F8 40 33 C9 8A 88 ?? ?? ?? ?? 8B 95 }

$ = { 60 E8 ?? ?? ?? ?? 5D 50 51 ?? ?? ?? ?? ?? ?? ?? ?? ?? EB 0F ??EB ?? ?? EB ?? ?? EB }

$ = { 44 64 65 44 61 74 61 20 69 6E 69 74 69 61 6C 69 7A 65 64 20 28 41 4E 53 49 29 2C 20 61 70 70 20 73 74 72 69 6E 67 73 20 61 72 65 20 27 25 73 }

$ = { 31 2E 31 2E 34 00 00 00 C2 E0 94 BE 93 FC DE C6 B6 24 83 F7 D2 A4 92 77 40 27 CF EB D8 6F 50 B4 B5 29 24 FA 45 08 04 52 D5 1B D2 8C 8A 1E 6E }

condition:

any of them
}

rule FSG

{

strings:

$noep1  = { 0B D0 8B DA E8 02 00 00 00 40 A0 5A EB 01 9D B8 80 ?? ?? ?? EB 02 CD 20 03 D3 8D 35 F4 00 }

$noep2  = { 33 D2 0F BE D2 EB 01 C7 EB 01 D8 8D 05 80 ?? ?? ?? EB 02 CD 20 EB 01 F8 BE F4 00 00 00 EB }

$noep3  = { 33 C2 2C FB 8D 3D 7E 45 B4 80 E8 02 00 00 00 8A 45 58 68 02 ?? 8C 7F EB 02 CD 20 5E 80 C9 }

$noep4  = { 80 E9 A1 C1 C1 13 68 E4 16 75 46 C1 C1 05 5E EB 01 9D 68 64 86 37 46 EB 02 8C E0 5F F7 D0 }

$noep5  = { BE A4 01 40 00 AD 93 AD 97 AD 56 96 B2 80 A4 B6 80 FF 13 73 }

$noep6  = { E8 01 00 00 00 0E 59 E8 01 00 00 00 58 58 BE 80 ?? ?? 00 EB 02 61 E9 68 F4 00 00 00 C1 C8 }

$noep7  = { E8 01 00 00 00 5A 5E E8 02 00 00 00 BA DD 5E 03 F2 EB 01 64 BB 80 ?? ?? 00 8B FA EB 01 A8 }

$noep8  = { EB 01 DB E8 02 00 00 00 86 43 5E 8D 1D D0 75 CF 83 C1 EE 1D 68 50 ?? 8F 83 EB 02 3D 0F 5A }

$noep9  = { EB 01 56 E8 02 00 00 00 B2 D9 59 68 80 ?? 41 00 E8 02 00 00 00 65 32 59 5E EB 02 CD 20 BB }

$noep10 = { EB 01 4D 83 F6 4C 68 80 ?? ?? 00 EB 02 CD 20 5B EB 01 23 68 48 1C 2B 3A E8 02 00 00 00 38 }

$noep11 = { EB 02 AB 35 EB 02 B5 C6 8D 05 80 ?? ?? 00 C1 C2 11 BE F4 00 00 00 F7 DB F7 DB 0F BE 38 E8 }

$noep12 = { EB 02 CD 20 2B C8 68 80 ?? ?? 00 EB 02 1E BB 5E EB 02 CD 20 68 B1 2B 6E 37 40 5B 0F B6 C9 }

$noep13 = { EB 02 CD 20 EB 02 CD 20 EB 02 CD 20 C1 E6 18 BB 80 ?? ?? 00 EB 02 82 B8 EB 01 10 8D 05 F4 }

$noep14 = { EB 02 09 94 0F B7 FF 68 80 ?? ?? 00 81 F6 8E 00 00 00 5B EB 02 11 C2 8D 05 F4 00 00 00 47 }

$noep15 = { 23 CA EB 02 5A 0D E8 02 00 00 00 6A 35 58 C1 C9 10 BE 80 ?? ?? 00 0F B6 C9 EB 02 CD 20 BB }

$noep16 = { 2B C2 E8 02 00 00 00 95 4A 59 8D 3D 52 F1 2A E8 C1 C8 1C BE 2E ?? ?? 18 EB 02 AB A0 03 F7 }

$noep17 = { 1B DB E8 02 00 00 00 1A 0D 5B 68 80 ?? ?? 00 E8 01 00 00 00 EA 5A 58 EB 02 CD 20 68 F4 00 }

$noep18 = { 03 DE EB 01 F8 B8 80 ?? 42 00 EB 02 CD 20 68 17 A0 B3 AB EB 01 E8 59 0F B6 DB 68 0B A1 B3 }

$noep19 = { 03 F7 23 FE 33 FB EB 02 CD 20 BB 80 ?? 40 00 EB 01 86 EB 01 90 B8 F4 00 00 00 83 EE 05 2B }

$noep21 = { C1 CB 10 EB 01 0F B9 03 74 F6 EE 0F B6 D3 8D 05 83 ?? ?? EF 80 F3 F6 2B C1 EB 01 DE 68 77 }

$noep22 = { C1 C8 10 EB 01 0F BF 03 74 66 77 C1 E9 1D 68 83 ?? ?? 77 EB 02 CD 20 5E EB 02 CD 20 2B F7 }

$noep23 = { 2C 71 1B CA EB 01 2A EB 01 65 8D 35 80 ?? ?? 00 80 C9 84 80 C9 68 BB F4 00 00 00 EB 01 EB }

$noep24 = { F7 D8 40 49 EB 02 E0 0A 8D 35 80 ?? ?? ?? 0F B6 C2 EB 01 9C 8D 1D F4 00 00 00 EB 01 3C 80 }

$noep25 = { F7 D0 EB 02 CD 20 BE BB 74 1C FB EB 02 CD 20 BF 3B ?? ?? FB C1 C1 03 33 F7 EB 02 CD 20 68 }

$noep26 = { F7 DB 80 EA BF B9 2F 40 67 BA EB 01 01 68 AF ?? ?? BA 80 EA 9D 58 C1 C2 09 2B C1 8B D7 68 }

$noep27 = { F7 D8 0F BE C2 BE 80 ?? ?? 00 0F BE C9 BF 08 3B 65 07 EB 02 D8 29 BB EC C5 9A F8 EB 01 94 }

$noep28 = { 91 EB 02 CD 20 BF 50 BC 04 6F 91 BE D0 ?? ?? 6F EB 02 CD 20 2B F7 EB 02 F0 46 8D 1D F4 00 }

$noep29 = { C1 CE 10 C1 F6 0F 68 00 ?? ?? 00 2B FA 5B 23 F9 8D 15 80 ?? ?? 00 E8 01 00 00 00 B6 5E 0B }

$noep30 = { C1 F0 07 EB 02 CD 20 BE 80 ?? ?? 00 1B C6 8D 1D F4 00 00 00 0F B6 06 EB 02 CD 20 8A 16 0F }

$noep31 = { C1 E0 06 EB 02 CD 20 EB 01 27 EB 01 24 BE 80 ?? 42 00 49 EB 01 99 8D 1D F4 00 00 00 EB 01 }

$noep32 = { D1 E9 03 C0 68 80 ?? ?? 00 EB 02 CD 20 5E 40 BB F4 00 00 00 33 CA 2B C7 0F B6 16 EB 01 3E }

$noep33 = { EB 02 CD 20 EB 01 91 8D 35 80 ?? ?? 00 33 C2 68 83 93 7E 7D 0C A4 5B 23 C3 68 77 93 7E 7D }

$noep34 = { 4B 45 52 4E 45 4C 33 32 2E 64 6C 6C 00 00 4C 6F 61 64 4C 69 62 72 61 72 79 41 00 00 47 65 }

$noep35 = { 0F BE C1 EB 01 0E 8D 35 C3 BE B6 22 F7 D1 68 43 ?? ?? 22 EB 02 B5 15 5F C1 F1 15 33 F7 80 }

$noep36 = { 0F B6 D0 E8 01 00 00 00 0C 5A B8 80 ?? ?? 00 EB 02 00 DE 8D 35 F4 00 00 00 F7 D2 EB 02 0E }

$noep37 = { 87 FE E8 02 00 00 00 98 CC 5F BB 80 ?? ?? 00 EB 02 CD 20 68 F4 00 00 00 E8 01 00 00 00 E3 }

$ep1 = { BB D0 01 40 ?? BF ?? 10 40 ??BE }

$ep2 = { EB 01 ?? EB 02 ?? ?? ?? 80 ?? ?? 00 }

$ep3 = { BB D0 01 40 ?? BF ?? 10 40 ??BE }

$ep4 = { EB 01 ?? EB 02 ?? ?? ?? 80 ?? ?? 00 }

$ep5 = { BE ?? ?? ?? 00 BF ?? ?? ?? 00 BB ?? ?? ?? 00 53 BB ?? ?? ?? 00 B2 80 }

$ep6 = { EB 02 CD 20 03 ?? 8D ?? 80 ?? ?? 00 ?? ?? ?? ?? ?? ?? ?? ?? ?? EB 02 }

$ep7 = { EB 02 CD 20 ?? CF ?? ?? 80 ?? ?? 00 ?? ?? ?? ?? ?? ?? ?? ?? 00 }

$ep8 = { 87 25 ?? ?? ?? ?? 61 94 55 A4 B6 80 FF 13 }

condition:

any of ($noep*) or for any of ($ep*) : ($ at entrypoint)
}

rule UPX

{

strings:

$noep1 = { B8 ?? ?? ?? ?? B9 ?? ?? ?? ?? 33 D2 EB 01 0F 56 EB 01 0F E8 03 00 00 00 EB 01 0F EB 01 0F 5E EB 01 }

$noep2 = { 5E 89 F7 B9 ?? ?? ?? ?? 8A 07 47 2C E8 3C 01 77 F7 80 3F ?? 75 F2 8B 07 8A 5F 04 66 C1 E8 08 C1 C0 10 86 C4 29 F8 80 EB E8 01 F0 89 07 83 C7 }

$noep3 = { 01 DB [0-1] 07 8B 1E 83 EE FC 11 DB [1-4] B8 01 00 00 00 01 DB }

$noep4 = { 9C 60 E8 00 00 00 00 5D B8 B3 85 40 00 2D AC 85 40 00 2B E8 8D B5 D5 FE FF FF 8B 06 83 F8 00 74 11 8D B5 E1 FE FF FF 8B 06 83 F8 01 0F 84 F1 }

$noep5 = { 8A 06 46 88 07 47 01 DB 75 07 8B 1E 83 EE FC 11 DB }

$noep6 = { FF D5 80 A7 ?? ?? ?? ?? ?? 58 50 54 50 53 57 FF D5 58 61 8D 44 24 ?? 6A 00 39 C4 75 FA 83 EC 80 E9 }

$noep7 = { 55 FF 96 ?? ?? ?? ?? 09 C0 74 07 89 03 83 C3 04 EB ?? FF 96 ?? ?? ?? ?? 8B AE ?? ?? ?? ?? 8D BE 00 F0 FF FF BB 00 10 00 00 50 54 6A 04 53 57 }

$noep8 = { FF D5 8D 87 ?? ?? ?? ?? 80 20 ?? 80 60 ?? ?? 58 50 54 50 53 57 FF D5 58 61 8D 44 24 ?? 6A 00 39 C4 75 FA 83 EC 80 E9 }

$ep1 = { 60 E8 00 00 00 00 58 83 E8 3D }

$ep2 = { 60 E8 00 00 00 00 83 CD FF 31 DB 5E }

$ep3 = { 50 BE ?? ?? ?? ?? 8D BE ?? ?? ?? ?? 57 83 CD }

condition:

 

any of ($noep*) or for any of ($ep*) : ($ at entrypoint)

Using these tools allow you to quickly identify known malware. The ClamAV may show that the suspicious file is a known malware. At this point, you will classify the incident under the name of this malware with a detailed report and briefing about the incident.

If after using ClamAV, it is still an unknown file type and there is no clear information about the suspicious file, we will need to go to the next step in analyzing the file. This will either require with a static analysis (to examine the code) or a dynamic analysis (executing the malware in a monitored environment to observe its behaviors).

With YARA you can create descriptions of malware families based on textual or binary patterns contained in samples fromthose families. You can create rules to find malware that attempts to brute force accounts and logins or create rules with antivirus process/service or domain names to identify malware that attempts to terminate or disable A/V products.

YARA is used by VirusTotal Malware Intelligence Services (http://vt-mis.com),jsunpack-n (http://jsunpack.jeek.org/) and We Watch Your Website (http://www.wewatchyourwebsite.com/)

Reference:

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Malware Analyst's Cookbook: http://www.amazon.com/Malware-Analysts-Cookbook-DVD-Techniques/dp/0470613033

Mourad Ben Lakhoua
Mourad Ben Lakhoua

Mourad Ben Lakhoua is a security researcher for InfoSec Institute and an Information Security practitioner specializing in Cybersecurity, Penetration Testing, Risk Management, Cloud Computing, Social Media and Network System Security. He works as a security researcher at Tunisian Computer Emergency and Response Team tun-CERT.

You can find more of his articles, analysis and commentary at http://www.infosecisland.com, where he is a featured contributor.

Mourad has extensive experience with multiple security issues, including cryptography, host and network hardening, resilient architecture and application security.

In 2005 he successfully completed a Masters Degree in Information Security where he handled and managed extensive research in Global IT Security.

For more of Mourad's analysis, visit his information security blog at http://www.sectechno.com. You can also follow him on Twitter @Mbenlakhoua.