Using YARA to Detect Patterns in Malware
What is YARA?
YARA is a program that can detect patterns in binary or text. The name is an acronym – “Yet Another Ridiculous Acronym”, or “YARA – Another Recursive Acronym”, depending on who you ask.
Users can define which patterns to look for with rules. This is especially useful for identifying malware since you can define rules to look for all manner of things, such as a specific bitcoin wallet address, or the IP address of a C2 server, for instance.
Why use YARA?
Use of the tool can be proactively combined with a deployed IPS or it can be implemented in an incident response toolkit to identify samples and compromised devices. In some cases, it can even detect new malware samples based on patterns detected in samples from the same malware family.
YARA rules
Rules in YARA are written in a proprietary syntax in a file with a .yar or .yara extension. In this file, the rule author specifies the patterns for which YARA should seek upon command execution.
There are several components that make up a YARA rule. I will provide examples of each component based on the YARA rule CISA provided for the HermeticWiper malware.
Here’s a shameless plug for the post I made about HermeticWiper earlier this year.
Rule identifier
Every YARA rule must begin with the rule
keyword followed by a rule identifier and a set of curly braces to contain the rest of the rule.
For instance:
rule CISA_10375867_01 : wiper HERMETICWIPER
{
...
}
Metadata
Metadata is loaded into a YARA rule via front matter. Like most metadata, it doesn’t change the behavior of the rule, but it provides useful information about the rule, such as the author, the date of creation, sample hashes, and more.
In CISA’s example, the metadata includes the author and date, but also includes several MD5 and SHA256 hash digests.
meta:
Author = "CISA Code & Media Analysis"
Incident = "10375867"
Date = "2022-04-05"
Last_Modified = "20220406_1500"
Actor = "n/a"
Category = "Wiper"
Family = "n/a"
Description = "Detects Hermetic Wiper samples"
MD5_1 = "382fc1a3c5225fceb672eea13f572a38"
SHA256_1 = "2c10b2ec0b995b88c27d141d6f7b14d6b8177c52818687e4ff8e6ecf53adf5bf"
MD5_2 = "decc2726599edcae8d1d1d0ca99d83a6"
SHA256_2 = "3c557727953a8f6b4788984464fb77741b821991acbf5e746aebdd02615b1767"
MD5_3 = "84ba0197920fd3e2b7dfa719fee09d2f"
SHA256_3 = "0385eeab00e946a302b24a91dea4187c1210597b8e17cd9e2230450f5ece21da"
MD5_4 = "3f4a16b29f2f0532b7ce3e7656799125"
SHA256_4 = "1bc44eef75779e3ca1eefb8ff5a64807dbc942b1e4a2672d77b9f6928d292591"
MD5_5 = "f1a33b2be4c6215a1c39b45e391a3e85"
SHA256_5 = "06086c1da4590dcc7f1e10a6be3431e1166286a9e7761f2de9de79d7fda9c397"
Comments
Identical to C comments. Single line and multi-line comments are supported.
Strings
Strings are the unique pattern to be matched in a malware sample. They’re declared in the form of a variable, and can be an ASCII (or Unicode, if you like) string, hexadecimal, or regular expressions.
The strings section is not a requirement, but you’ll want to include it anyway. Strings
In the CISA example, 14 strings are initialized in variables. The first one looks like this:
$rsrc1 = { 53 5A 44 44 }
The same string in text format would look like this:$rsrc1 = "SZDD"
Note the difference; text strings are wrapped in quotation marks, while hex strings are wrapped in curly braces.
There are some special constructions that an author can take advantage of too, such as wildcard characters in a hexadecimal string. For more information about them, see the YARA documentation.
Here is the entire strings section from the CISA rule:
strings:
$rsrc1 = { 53 5A 44 44 }
$rsrc2 = { 52 00 43 00 44 00 41 00 54 00 41 00 }
$rsrc3 = { 44 00 52 00 56 00 5F 00 58 00 36 00 34 }
$rsrc4 = { 44 00 52 00 56 00 5F 00 58 00 38 00 36 }
$rsrc5 = { 44 00 52 00 56 00 5F 00 58 00 50 00 5F 00 58 00 36 00 34 }
$rsrc6 = { 44 00 52 00 56 00 5F 00 58 00 50 00 5F 00 58 00 38 00 36 00 }
$s1 = { 45 00 50 00 4D 00 4E 00 54 00 44 00 52 00 56 00 5C 00 25 00 75 }
$s2 = { 50 00 68 00 79 00 73 00 69 00 63 00 61 00 6C 00 44 00 72 00 69 00 76 00 65 00 25 00 75 }
$s3 = { 53 00 59 00 53 00 54 00 45 00 4D 00 5C 00 43 00 75 00 72 00 72 00 65 00 6E 00 74 00 43 00 6F 00 6E 00 74 00 72 00 6F 00 6C 00 53 00 65 00 74 00 5C 00 43 00 6F 00 6E 00 74 00 72 00 6F 00 6C 00 5C 00 43 00 72 00 61 00 73 00 68 00 43 00 6F 00 6E 00 74 00 72 00 6F 00 6C }
$s4 = { 43 00 72 00 61 00 73 00 68 00 44 00 75 00 6D 00 70 00 45 00 6E 00 61 00 62 00 6C 00 65 00 64 }
$s5 = { 24 00 49 00 4E 00 44 00 45 00 58 00 5F 00 41 00 4C 00 4C 00 4F 00 43 00 41 00 54 00 49 00 4F 00 4E }
$s6 = { 53 00 65 00 4C 00 6F 00 61 00 64 00 44 00 72 00 69 00 76 00 65 00 72 00 50 00 72 00 69 00 76 00 69 00 6C 00 65 00 67 00 65 }
$s7 = { 53 00 65 00 42 00 61 00 63 00 6B 00 75 00 70 00 50 00 72 00 69 00 76 00 69 00 6C 00 65 00 67 00 65 }
$s8 = { 43 00 3A 00 5C 00 57 00 69 00 6E 00 64 00 6F 00 77 00 73 00 5C 00 53 00 59 00 53 00 56 00 4F 00 4C }
Conditions
The conditions section is the second requirement to make a complete YARA rule, after the rule identifier itself.
The condition section sets criteria for whether the rule returns a successful match or not. It’s possible to stipulate, in a pseudocode example, “if $abc appears exactly 3 times, OR if $xyz appears more than 4 times, return a positive match”. Such a condition would look like this:
condition:
($abc == 3) or ($xyz > 4)
In the CISA example, here’s the condition:
condition:
uint16(0) == 0x5A4D and ((3 of ($rsrc*)) and (7 of ($s*)))
Several interesting things to note about this condition:
uint16(0) == 0x5A4D
indicates we’re looking for a Windows executable, since Windows exes always have hexadecimal 4D5A at the start of the file header.- Asterisk is used as a wildcard in conjunction with the “of” keyword. This allows the author to specify a series of variables, or in this case, two series of variables ($rsrc and $s) with a threshold of matches for each. If the number of matches in each series of variables matches the set threshold (3 and 7, respectively), the rule returns a successful condition.
- In both examples, Boolean operators
and
andor
are used. Thenot
operator is also permitted in a condition, as are relational operators>=
,<=
,<
,>
,==
, and!=
. Arithmetic operators work on numeric expressions.
Rule repositories
There are more ways to add complexity and nuance to your YARA rules. Fortunately, it’s possible to use third-party YARA rules written by the community. A good place to start is the Awesome YARA Github repo, or the Elastic Security Protections Artifacts repo:
Another good place to look is the Valhalla rule feed. It’s a fantastic tool, but it requires an API key.
Creating rules with yarGen
Writing effective YARA rules is a skill all in itself. It’s easy to create rules that generate lots of false positives, or are so narrow in focus to have the same efficacy as a hash value. That’s where a tool called yarGen can help.
yarGen can be used to automatically generate YARA rules. The tool comes with a database of known-good strings from goodware to prevent false positives. It can analyze a sample for which you’d like to generate a YARA rule, extract strings from the file, and output a YARA rule that is designed to avoid false positives by ignoring common goodware strings.
Typical execution of yarGen looks something like this:python3 yarGen.py -m [path of sample file] --excludegood -o [output file path]
The rules are good enough to use out of the box, but the author recommends testing them against malware samples and a large goodware archive.
Command execution
Once you have a rule you’d like to use, it should be passed as the first argument when the yara
command is invoked, followed by the file path to be scanned.
Sources
https://www.varonis.com/blog/yara-rules
https://tryhackme.com/room/yara
https://yara.readthedocs.io/en/stable/writingrules.html
https://www.cisa.gov/uscert/ncas/analysis-reports/ar22-115a
https://github.com/InQuest/awesome-yara
https://github.com/elastic/protections-artifacts
https://github.com/Neo23x0/yarGen
https://www.nextron-systems.com/2015/02/16/write-simple-sound-yara-rules/
https://valhalla.nextron-systems.com/