Version: 4.1.0.13

maec.analytics.distance Module

Classes

class maec.analytics.distance.Distance(maec_entity_list)

Bases: object

Calculates distance between two or more MAEC entities. Currently supports only Packages or Malware Subjects.

add_log(number, log_list)

Added a log’d (log-ized??) number to a list

bin_list(numeric_value, numeric_list, n=10)

Bin a numeric value into a bucket, based on a parent list of values. N = number of buckets to use (default = 10).

build_string_vector(string_list, superset_string_list, ignore_case=True)

Build a vector from an input list of strings and superset list of strings.

calculate()

Calculate the distances between the input Malware Subjects.

create_dynamic_result_vector(dynamic_vector)

Construct the dynamic result (matching) vector for a corresponding feature vector

create_static_result_vector(static_vector)

Construct the static result (matching) vector for a corresponding feature vector

create_superset_vectors()

Calculate vector supersets from the feature vectors

euclidean_distance(vector_1, vector_2)

Calculate the Euclidean distance between two input vectors

flatten_vector(vector_entry_list)

Generate a single, flattened vector from an input list of vectors or values.

generate_feature_vectors(merged_subjects)

Generate a feature vector for the binned Malware Subjects

normalize_numeric(numeric_value, numeric_list, normalize=True, scale_log=True)

Scale a numeric value, based on a parent list of values. Return the scaled/normalized form.

normalize_numeric_list(value_list, numeric_list, normalize=True, scale_log=True)

Scale a list of numeric values, based on a parent list of numeric value lists. Return the scaled/normalized form.

normalize_vectors(vector_1, vector_2)

Normalize two input vectors so that they have similar composition.

perform_calculation()

Perform the actual distance calculation. Store the results in the distances dictionary.

populate_hashes_mapping(malware_subject_list)

Populate and return the Malware Subject -> Hashes mapping from an input list of Malware Subjects.

preprocess_entities(dereference=True)

Pre-process the MAEC entities

print_distances(file_object, default_label='md5', delimiter=', ')

Print the distances between the Malware Subjects in delimited matrix format to a File-like object.

Try to use the MD5s of the Malware Subjects as the default label. Uses commas as the default delimiter, for CSV-like output.

class maec.analytics.distance.StaticFeatureVector(malware_subject, deduplicator)

Bases: object

Generate a feature vector for a Malware Subject based on its static features

create_object_vector(object, static_feature_dict, callback_function=None)

Create a vector from a single Object

create_static_vectors(malware_subject)

Create a vector of static features for an input Malware Subject

extract_features(malware_subject)

Extract the static features from the Malware Subject

get_unique_features()

Calculates the unique set of static features for the Malware Subject

class maec.analytics.distance.DynamicFeatureVector(malware_subject, deduplicator, ignored_object_properties, ignored_actions)

Bases: object

Generate a feature vector for a Malware Subject based on its dynamic features

create_action_vector(action)

Create a vector from a single Action

create_dynamic_vectors(malware_subject)

Create a vector of unique action/object pairs for an input Malware Subject

extract_features(malware_subject)

Extract the dynamic features from the Malware Subject

get_unique_features()

Calculates the unique set of dynamic features for the Malware Subject

prune_dynamic_features(min_length=2)

Prune the dynamic features based on ignored Object properties/Actions