Understanding the Shortcomings of Wehe on HTTPS Traffic Detection

cover
11 Apr 2024

Authors:

(1) Vinod S. Khandkar and Manjesh K. Hanawal, Industrial Engineering and Operations Research Indian Institute of Technology Bombay, Mumbai, India and {vinod.khandkar, mhanawal}@iitb.ac.in.

Abstract & Introduction

Related Work and Background

Challenges in TD Detection Measurement Setup Development

Case Study : Wehe - TD Detection Tool for Mobile Environment

Shortcoming of Wehe on HTTPS Traffic

TD Detection of HTTPS Traffic

Conclusion & References

V. SHORTCOMING OF WEHE ON HTTPS TRAFFIC

Our study focuses on validating the network responses for the replayed traffic streams, TD detection, and operational feasibility in various network configurations. While operational feasibility is validated using the publicly available “Wehe” Android app on Google Playstore, TD detection scenarios are validated using theoretical arguments. The validation of network responses requires bandwidth analysis of the received traffic stream. This analysis requires the network logs for the specific replay performed as per the validation scenario. The replay done on the device and multiple other streaming

(a) Using internet browser

(b) Using user-client

Fig. 6. Traffic classification of replayed YouTube traffic

services running in parallel is one such scenario. Wehe app does not immediately provide such network logs for the replays after the completion of tests. So, we implemented the user-client and server that mimics the behavior of the Wehe tool.

We use a client-server setup similar to the setup shown in Fig. 3 for validating Wehe. In the current setup, we replaced the original service’s server with replay server. The userclient and replay server are connected through a commercial traffic shaper. Moreover, our setup has a provision to perform multiple replays in parallel. Our validation setup does not need administrative channels and overheads, e.g., side-channels. Our server always needs to support a single-user client. The validation of scenarios with multiple clients uses the Wehe App directly due to the non-requirement of associated traffic analysis.

Reminder of this section describes results of the validation.

A. Original service’s traffic emulation

The network responses are dependent on the application of network policies based on correct probing traffic classification by network middle-boxes as mentioned in Sec.III-A. We validated the classification of Wehe’s emulated traffic using a commercial traffic shaper. The classification of emulated traffic is observed using the user interface of commercial traffic shaper.

For performing experiment, the YouTube application data is replayed from replay server to user-client through commercial traffic shaper. During data transfer, the commercial traffic shaper’s user-interface window is observed for the presence of YouTube traffic. We also accessed the YouTube traffic using an internet browser and observed the traffic shaper’s classification outcome to baseline our traffic classification observations.

Fig. 6 shows the traffic classification outcome as shown by commercial traffic shaper’s user-interface window for traffic accessed directly using an internet browser from a YouTube server. It shows the internet activity in the left window and the classification of corresponding traffic in the right window.

Fig. 6(a) shows that traffic accessed using internet browser is correctly classified as YouTube. This is inline with commercial traffic shaper’s intended behaviour.

Fig. 6(b) shows the traffic classification outcome for traffic accessed using user-client. It shows the evidence of no YouTube traffic transferred over the communication link. Moreover, it classifies the same traffic as HTTPS traffic. The outcome of this experiment shows that not all network middleboxes can correctly classify the Wehe’s replayed traffic.

B. Effect of data rate in replay script

The probing traffic generation impacts the network response as expected by the TD detection algorithm Wehe uses the traffic trace from the original service for generating replay scripts that preserve the application data and its timing relationship. This replay script is used over the original network and also on networks that are differently geo-located. As traffic shaping rate varies across networks for the same service (as mentioned in [32]), the traffic rate preserved in the replay script can be different from the traffic shaping rate of the currently considered network. The replay traffic rate can be lower than the traffic shaping rate.

The Wehe methodology does not detect traffic differentiation if the replay script’s traffic rate is lower than the network’s shaping rate as it does not affect the traffic stream. Such replay scripts can never detect traffic shaping on such networks. Thus Wehe tool’s TD detection capability is limited by the replay script’s probing traffic rate.

C. Usage of port number 80

The network responses are driven by network middle-boxes perception about the probing traffic (refer Sec. III-A). The replay script preserves the data in the applications’ original network trace. The original application servers use port 80 for the plain-text data and port 443 for encrypted data transfer. Wehe replay script directly uses the encrypted data from the application’s network trace and transmits it on port 80. In such cases, Wehe expects its original replay traffic stream to be classified correctly by network devices using encrypted application data. It is impossible for such data on port 80 as encrypted traffic data cannot expose its identification to the network device. Thus Wehe tool cannot generate the required traffic streams for services running on the port number 443 due to default usage of port 80 for replay run.

D. Traffic load governed network behavior

Note that scarcity of resources prompt networks to apply certain network traffic management, especially in heavy network load scenarios, that are beneficial for all active services on its network, e.g., QoS-based traffic management (refer Sec. III-A). We validated the effect of such traffic management

(a) Only Wehe

(b) Wehe plus one service

(c) Wehe plus two services

Fig. 7. Effect of network load on Wehe’s traffic stream performances

on the performances of both control and original replays. The validation uses the following three scenarios for the validation,

• Replaying only Wehe’s two traffic streams without any load on the network (Fig 7(a))

• Replaying Wehe’s three traffic streams with one additional streaming service running in parallel (Fig. 7(b))

• Replaying Wehe’s three traffic streams with 2 additional streaming services running in parallel (Fig. 7(c))

The performances in Fig. 7(a) show that performances of traffic streams generated by the Wehe tool are the same under no additional network load conditions. As network load increases, the performance of control replay deviates from that of original replay and at higher level (Fig. 7(b)). While performance of control replay further deviates from original replay on lower side, two original replays still shows similar performances as shown in Fig. 7(c). It invalidates the Wehe tool’s expectation of control replay not getting differentiated. It also invalidates the claim of the tool of detecting the TD due to total bandwidth.

E. Wehe tests from multiple devices within the same sub-net

The side-channels are introduced in Wehe design to support multiple user clients simultaneously. Side channels also assist in identifying the mapping between user-client and a combination of IP addresses and ports. It is useful in the case of networks using NATs [33]. We validated Wehe’s support for multiple clients and NAT-enabled networks using two different tests. First, we connected two user-clients from within the same subnet, i.e., clients sharing the same public IP address. In one test, the Wehe tool tests the same service on both devices, e.g., Wehe App on both devices tests for YouTube. The result shows that the Wehe test completed finishing on only one device while Wehe App abruptly closed on another device. We repeated the same scenario, but this time Wehe tests different services, e.g., Wehe on one device testing YouTube during another testing Netflix. We found that the Wehe test on one device completes properly while the Wehe test on another device throws an error on the screen, informing the user that another client is already performing the test, as shown in Fig. 9. These tests show that Wehe does not support multiple devices if they share the same IP address. While a side-channel is useful to identify each replay from a user-client connected directly to the Wehe replay server, it is not useful in the network using NAT devices. Multiple users share the same IP address in the case of NAT. In such cases, the side channel can not uniquely map each replay run to a client. It limits the usage of Wehe to only one active client per replay server and ISP and application. This limitation is documented by Wehe developers as well.

F. Effect of device network load on TD detection

Wehe’s replay server uses the same timings between application data transfer as that of original application traffic. Such a transmission strategy is expected not to exhaust available bandwidth. Hence the effect of source rate modulation due to overshooting of traffic rate above available bandwidth is unlikely. It makes, original and control replays show similar traffic performances unless deliberately modified by network policies i.e. TD.

Nevertheless, this expectation is not always satisfied as the traffic data rate is modulated by the network load at the user device while performing Wehe tests. Such perturbations create discrepancy as the effect of time-varying current network load on the probing traffic is also time-varying and may not always be the same. The back-to-back replay strategy of Wehe ensures that both (original and control replay) probing traffic streams gets affected differently by the current network load. Under such network load on the device side, the notion of services not exhausting available bandwidth ceases to exist along with its benefits for TD detection. It is necessary to normalise such confounding factors (refer Sec. III-B) before TD detection.

This paper is available on arxiv under CC 4.0 license.