Our reference data set is extracted from our large database of malware binaries maintained at the CWSandbox.
The malware binaries have been collected over a period of three years from a variety of sources. From the overall database, we select binaries
which have been assigned to a known class of malware by the majority of six independent anti-virus products. We append the overall anti-virus
label to the filename of each report. Although anti-virus labels suffer from inconsistency, we expect the selection using different
scanners to be reasonable consistent and accurate. To compensate for the skewed distribution of classes, we discard classes with less than 20
samples and restrict the maximum contribution of each class to 300 binaries. The selected malware binaries are then executed and monitored using
CWSandbox, resulting in a total of 3.131 behavior reports in MIST format. A listing of the contained malware classes is provided
here. You can download the data set in the original CWSandbox encoding, in the sequential CWSandbox and in the
MIST encoding.
| Malware class | # |
|---|
| ADULTBROWSER | 262 |
| ALLAPLE | 300 |
| BANCOS | 48 |
| CASINO | 140 |
| DORFDO | 65 |
| EJIK | 168 |
| FLYSTUDIO | 33 |
| LDPINCH | 43 |
| LOOPER | 209 |
| MAGICCASINO | 174 |
| PODNUHA | 300 |
| POSION | 26 |
|
| Malware class | # |
|---|
| PRONDIALER | 98 |
| RBOT | 101 |
| ROTATOR | 300 |
| SALITY | 85 |
| SPYGAMES | 139 |
| SWIZZOR | 78 |
| VAPSUP | 45 |
| VIKING_DLL | 158 |
| VIKING_DZ | 68 |
| VIRUT | 202 |
| WOIKOINER | 50 |
| ZHELATIN | 41 |
| |
The application data set consists of seven chunks of malware binaries obtained from the anti-malware vendor Sunbelt
Software. The binaries correspond to malware collected during seven consecutive days in August 2009 and originate from a variety of sources.
Sunbelt Software uses these very samples to create and update signatures for their VIPRE
anti-malware product as well as for their security data feed ThreatTrack.
The complete application data set consists of 33.698 behavior reports in MIST format. We also append the results of Kaspersky Anti-Virus - thanks to
Virustotal - to the filname of the reports. Statistics for the data set and the characteristics of the contained behavior
reports are provided here. The data set can be downloaded in serveral encodings.
| Data set description | |
| Collection period | August 1-7, 2009 |
| Collection location | Sunbelt Software |
| Data set size (kilobytes) | 21.808.644 |
| Number of reports | 33.698 |
| Data set statistics | min. | avg. | max. |
| Reports per day | 3.760 | 4.814 | 6.746 |
| Instructions per report | 15 | 11.921 | 103.039 |
| Size per report (kilobytes) | 1 | 647 | 5.783 |
|