web2vec.extractors.http_response_features module
- class web2vec.extractors.http_response_features.HttpResponseFeatures(redirects: bool, redirect_count: int, contains_forms: bool, contains_obfuscated_scripts: bool, contains_suspicious_keywords: bool, uses_https: bool, missing_x_frame_options: bool, missing_x_xss_protection: bool, missing_content_security_policy: bool, missing_strict_transport_security: bool, missing_x_content_type_options: bool, is_live: bool, server_version: str | None = None, body_length: int = 0, num_titles: int = 0, num_images: int = 0, num_links: int = 0, script_length: int = 0, special_characters: int = 0, script_to_special_chars_ratio: float = 0.0, script_to_body_ratio: float = 0.0, body_to_special_char_ratio: float = 0.0)[source]
Bases:
object- body_length: int = 0
- body_to_special_char_ratio: float = 0.0
- contains_forms: bool
- contains_obfuscated_scripts: bool
- contains_suspicious_keywords: bool
- is_live: bool
- missing_content_security_policy: bool
- missing_strict_transport_security: bool
- missing_x_content_type_options: bool
- missing_x_frame_options: bool
- missing_x_xss_protection: bool
- num_images: int = 0
- num_links: int = 0
- num_titles: int = 0
- redirect_count: int
- redirects: bool
- script_length: int = 0
- script_to_body_ratio: float = 0.0
- script_to_special_chars_ratio: float = 0.0
- server_version: str | None = None
- special_characters: int = 0
- uses_https: bool
- web2vec.extractors.http_response_features.body_length(response: Response) int[source]
Get the length of the body of the response.
- web2vec.extractors.http_response_features.body_to_special_char_ratio(response: Response) float[source]
Get the ratio of body to special characters in the response.
- web2vec.extractors.http_response_features.check_forms(response: Response) bool[source]
Check if the response contains any forms.
- web2vec.extractors.http_response_features.check_header_content_security_policy(response: Response) bool[source]
Check if the response is missing the Content-Security-Policy header.
- web2vec.extractors.http_response_features.check_header_strict_transport_security(response: Response) bool[source]
Check if the response is missing the Strict-Transport-Security
- web2vec.extractors.http_response_features.check_header_x_content_type_options(response: Response) bool[source]
Check if the response is missing the X-Content-Type-Options
- web2vec.extractors.http_response_features.check_header_x_frame_options(response: Response) bool[source]
Check if the response is missing the X-Frame-Options header.
- web2vec.extractors.http_response_features.check_header_x_xss_protection(response: Response) bool[source]
Check if the response is missing the X-XSS-Protection header
- web2vec.extractors.http_response_features.check_https(response: Response) bool[source]
Check if the response uses HTTPS.
- web2vec.extractors.http_response_features.check_obfuscated_scripts(response: Response) bool[source]
Check if the response contains any obfuscated scripts.
- web2vec.extractors.http_response_features.check_redirects(response: Response) bool[source]
Check if the response has been redirected.
- web2vec.extractors.http_response_features.check_server_version(response: Response) str | None[source]
Check the server version of the response.
- web2vec.extractors.http_response_features.check_suspicious_keywords(response: Response, keywords: List[str] | None = None) bool[source]
Check if the response contains any suspicious keywords.
- web2vec.extractors.http_response_features.count_redirects(response: Response) int[source]
Count the number of redirects in the response.
- web2vec.extractors.http_response_features.get_http_response_features(url: str | None = None, response: Response | None = None) HttpResponseFeatures[source]
Get the HTTP response features for a given URL or response object.
- web2vec.extractors.http_response_features.is_live(response: Response) bool[source]
Check if the response is live.
- web2vec.extractors.http_response_features.num_images(response: Response) int[source]
Get the number of images in the response
- web2vec.extractors.http_response_features.num_links(response: Response) int[source]
Get the number of links in the response.
- web2vec.extractors.http_response_features.num_titles(response: Response) int[source]
Get the number of titles in the response.
- web2vec.extractors.http_response_features.script_length(response: Response) int[source]
Get the length of the scripts in the
- web2vec.extractors.http_response_features.script_to_body_ratio(response: Response) float[source]
Get the ratio of scripts to body in