web2vec.extractors.http_response_features module

class web2vec.extractors.http_response_features.HttpResponseFeatures(redirects: bool, redirect_count: int, contains_forms: bool, contains_obfuscated_scripts: bool, contains_suspicious_keywords: bool, uses_https: bool, missing_x_frame_options: bool, missing_x_xss_protection: bool, missing_content_security_policy: bool, missing_strict_transport_security: bool, missing_x_content_type_options: bool, is_live: bool, server_version: str | None = None, body_length: int = 0, num_titles: int = 0, num_images: int = 0, num_links: int = 0, script_length: int = 0, special_characters: int = 0, script_to_special_chars_ratio: float = 0.0, script_to_body_ratio: float = 0.0, body_to_special_char_ratio: float = 0.0, time_response: float | None = None)[source]

Bases: object

body_length: int = 0
body_to_special_char_ratio: float = 0.0
contains_forms: bool
contains_obfuscated_scripts: bool
contains_suspicious_keywords: bool
is_live: bool
missing_content_security_policy: bool
missing_strict_transport_security: bool
missing_x_content_type_options: bool
missing_x_frame_options: bool
missing_x_xss_protection: bool
num_images: int = 0
num_titles: int = 0
redirect_count: int
redirects: bool
script_length: int = 0
script_to_body_ratio: float = 0.0
script_to_special_chars_ratio: float = 0.0
server_version: str | None = None
special_characters: int = 0
time_response: float | None = None
uses_https: bool
web2vec.extractors.http_response_features.body_length(response: Response) int[source]

Get the length of the body of the response.

web2vec.extractors.http_response_features.body_to_special_char_ratio(response: Response) float[source]

Get the ratio of body to special characters in the response.

web2vec.extractors.http_response_features.check_forms(response: Response) bool[source]

Check if the response contains any forms.

web2vec.extractors.http_response_features.check_header_content_security_policy(response: Response) bool[source]

Check if the response is missing the Content-Security-Policy header.

web2vec.extractors.http_response_features.check_header_strict_transport_security(response: Response) bool[source]

Check if the response is missing the Strict-Transport-Security

web2vec.extractors.http_response_features.check_header_x_content_type_options(response: Response) bool[source]

Check if the response is missing the X-Content-Type-Options

web2vec.extractors.http_response_features.check_header_x_frame_options(response: Response) bool[source]

Check if the response is missing the X-Frame-Options header.

web2vec.extractors.http_response_features.check_header_x_xss_protection(response: Response) bool[source]

Check if the response is missing the X-XSS-Protection header

web2vec.extractors.http_response_features.check_https(response: Response) bool[source]

Check if the response uses HTTPS.

web2vec.extractors.http_response_features.check_obfuscated_scripts(response: Response) bool[source]

Check if the response contains any obfuscated scripts.

web2vec.extractors.http_response_features.check_redirects(response: Response) bool[source]

Check if the response has been redirected.

web2vec.extractors.http_response_features.check_server_version(response: Response) str | None[source]

Check the server version of the response.

web2vec.extractors.http_response_features.check_suspicious_keywords(response: Response, keywords: List[str] | None = None) bool[source]

Check if the response contains any suspicious keywords.

web2vec.extractors.http_response_features.count_redirects(response: Response) int[source]

Count the number of redirects in the response.

web2vec.extractors.http_response_features.get_http_response_features(url: str | None = None, response: Response | None = None) HttpResponseFeatures[source]

Get the HTTP response features for a given URL or response object.

web2vec.extractors.http_response_features.is_live(response: Response) bool[source]

Check if the response is live.

web2vec.extractors.http_response_features.num_images(response: Response) int[source]

Get the number of images in the response

Get the number of links in the response.

web2vec.extractors.http_response_features.num_titles(response: Response) int[source]

Get the number of titles in the response.

web2vec.extractors.http_response_features.script_length(response: Response) int[source]

Get the length of the scripts in the

web2vec.extractors.http_response_features.script_to_body_ratio(response: Response) float[source]

Get the ratio of scripts to body in

web2vec.extractors.http_response_features.script_to_special_chars_ratio(response: Response) float[source]

Get the ratio of scripts to special characters in the response

web2vec.extractors.http_response_features.special_characters(response: Response) int[source]

Get the number of special characters in the response.