As the use of the Internet, and other massively networked systems like it, becomes increasingly widespread, the ease with which viruses proliferate grows with it. The result is the need for technologies designed to block these viruses, generically called malware (malicious software), at all major network stops, but especially at the terminal end-user point. The throughput of most end-user’s network connections, and thus the amount of potential data consumption, is greatly increasing as well.
While network-based detection systems have reached speeds of over 1Gb/s, the speed of actual virus scanning and malware preventive systems has not kept pace.
The amount of new malware released onto the public Internet is exploding. As most anti-virus software currently filters suspect files through string matching against pseudo-unique identifiers, each new malware sample, and variant subsample, requires its own signature . Thus the size of a antivirus’s signature set S is related
to γ, the number of all known malware samples, S ∝ γ. Please do note that ’signature’ is used interchangeably with ’identifier,’ both of which mean the unique value resulting from the inputting of the candidate malware sample into a cryptographic hash function (a function which maps an arbitrary input to a set-length output with uniform distribution).
Because virtually all anti-malware programs devote most of their resources to the matching of these signatures S to some arbitrary input stream, usually with an exact matching algorithm, the two main factors that determine the effectiveness of a solution are the ratio of detected to undetected inputs (possibly taking into account the rate of false-positives, although this is only a problem in solutions that utilize regular expression-based multi-pattern algorithms) and the scalability of the signature set.
Thus, any anti-virus solution that aims to protect its users from future malware types and variants must have update mechanisms in place which are able to update the known pattern set. The apparent solution is just to have a centralized update server, but this is sub-optimal, especially for open-source efforts, because
of the cost associated with it and the fact that it acts as an obvious and openly-facing target for malicious attackers. An ideal anti-malware system would be wholly efficient and extremely fast, but the two are generally at odds with one another, and thus trade-offs must be made in the search for an acceptable middle-ground. Really the aim of this project is to find that middle-ground. Objective evaluation shows that our solution, BlockAV, is an effective architecture that is more optimal than any currently available commercial or researched/published solution. Specifically, we show:
• Fast scanning speed with less memory usage: By layering a cache-efficient bloom filter on top of the more costly bloomier filter, BlockAV manages to increase end-to-end throughput of the average-case input by x14, and requires less memory to do so than traditional algorithms.
• Scalability: BlockAV can handle large numbers of signatures with ease, and further space-efficiency improvements in order-and-match construction within our
data structures will further improve scalability.
• Decentralized updates and maintenance: The community of users usingand maintaining BlockAV are provided a trustable, timeless conduit by which to work together that is not dependent on any centralized authority or schema other than cryptographic verification. This is accomplished through the use of a novel blockchain variant.
BlockAV should work on any architecture, provided that it has enough RAM and disk space to store the identifiers and load them into the memory. The low memory and disk space usage contributes to this.