Forum

Some thoughts which should be considered

Zitat

Hello all,

 

I recently wrote my bachelor thesis about app tracking and analyzed the traffic of around 30 apps using mitmproxy. Doing this analysis, I faced different problems which you might want to consider when creating this automated system.

 

Technical issues:

- How do you handle certificate pinning? Some apps can be really resisting and even need to be changed and recompiled. In my case, I tried using Frida/Objection. This failed because my device (or the kernel itself) didn't allow a specific modification which was needed to run Frida.

- Assuming you use rooted devices (mainly needed because since Android 7.0, user certificates aren't trusted anymore): How do you handle root detection? Some apps won't work with root and some apps even detect Magisk.

- How do you handle background data flows interfering with the "real" data flow? I noticed that whenever I installed an app, several requests were sent in the background, like notifying Google/Samsung which app has been installed. As far as I know, you can not determine the process (in this case: the app) which sent the request, so you either have to filter the background requests based on URLs or you have to modify the system settings so those background requests won't be executed. But if you simply block those background requests, some apps won't work (e.g. Google Play Protect doesn't allow you to install an app without being signed in to Google Play, so turning off Google Play isn't always an option).

 

Identifiers/data:

- Which identifiers should be searched in the logfiles? As example, the Android ID is not unique (on device level) anymore since Android 8.0 (https://android-developers.googleblog.com/2017/04/changes-to-device-identifiers-in.html). For Android versions below 8.0, the Android ID is unique for each device.

- What about vendor specific identifiers (e.g. Samsung devices have a Samsung Device ID)?

- Some of the HTTP(S) requests are encrypted or compressed (e.g. Google CrashLytics). How do you plan to analyze them? Or will you just ignore encrypted/compressed data?

 

Other aspects:

- How do you handle app permissions? Will you just give the app every permission it requests? What if the permission is requested a while after opening the app (e.g. when using a function which needs access to the contacts)?

- Do you plan to map the requests to specific third-party tracker? E.g. it's quite obvious that a request to google-analytics.com is from Google Analytics, but what about app-measurement.com (Google Firebase Analytics) or crashlytics.com (Google CrashLytics OR Answers (which somehow works together with Google CrashLytics))?

- How can you test apps which require a login? E.g. mobile banking apps require a bank account and if you don't have one, you are stuck on the first screen.

 

Basically, everything mentioned above can be summed up in one sentence: The behavior of an app and the system itself depends on a huge variety of factors.

 

If you have any follow-up questions, I'm happy to explain my points in greater detail.

 

Regards,

Mr. Y

Zitat

I also speak German, so if you feel more comfortable speaking German, don't hesitate 🙂

Zitat

Nicely put points, do you have any documentation for your thesis? Can you share the same?

Zitat

I just handed it in a few days ago and I'm waiting to get my grading. But still, I don't own the rights of my bachelor thesis so I'm not allowed to share it anyways.

 

Regarding my sources, my research is mainly based on the data interception environment provided by Privacy International and blog posts from Mike Kuketz:

https://privacyinternational.org/node/2732

https://www.kuketz-blog.de/android-tls-verifikation-und-certificate-pinning-umgehen/

https://www.kuketz-blog.de/?s=mitmproxy

 

Regards,

Y

Zitat

Hi Mr. Y, thanks for your points. We are aware of most of them (working closely together with Mike Kuketz for a long time) but to be honest, we do not have a good answer to most of them for the automated version.

Cert-Pinning: We were thinking of Frida. Can you tell us, which device did not work with the framework? We planned on using pixel-devices and emulation.

Root / emulation detection, banking-apps: There will be a number of apps, that we simply are not able to test with this approach. Banking-apps for instance, since we cant´t connect a real bank account, also uber or paypal, due to heavy obfuscation and other obstacles. So we might sort these out or do them by hand. The bulk of playstore-apps dont do root detection though.

Background-traffic: Good question. How did you deal with it?

Vendor-ID´s: Also, a very good point. Did you experience Apps/Services using Vendor-Id´s as identifiers or for tracking?

Encryption / Crashlytics: Aware of the problem but not a good solution yet. How did you deal with it? Did you manage to get around the crashlytics encryption? There is some unreadable data in the facebook-connection too.

Trackers: We will map data flow of specific trackers. Good point about about crashlytics.

Log-in: Yes, we plan to automate the log-in process too. Not for banking apps though.

E-Mail me if you are interested to join the team!

Best Miriam

 

Zitat

Hello Miriam

 

Cert Pinning: I used a Samsung Galaxy A7 (2018). The problem is that the "setenforce 0" only works with a custom kernel.

 

Background traffic: I filtered it manually but I only analyzed around 30 apps so the time effort wasn't that big.

 

Vendor ID: I haven't checked it in detail as my focus was more on Google Services. My assumption is that these Vendor IDs are usually used for background/system services, e.g. when installing an app on a Samsung device, Samsung gets notified.

 

Encryption/compression: I haven't spent much effort trying to decrypt/decompress the traffic. But I noticed that the Google Ad ID is sometimes readable when you view the body of the message in HEX mode. Not sure if a simple REGEX can find this identifier though. Also need to differentiate between encrypted and compressed - afaik Google CrashLytics data is only compressed but not encrypted.

 

Best, Y

Zitat

Thanks, we definitely will consider these points. Are you interested in participating with the coding work?

 

Zitat

I'm a COBOL developer with no other coding experience (and not enough time for another project) - so unfortunately I have to decline your offer.

 

Y

Zitat

Hi Miriam,

I'm currently writing my bachelor thesis with nearly the same topic like you are planning to do it with App Check. The only difference is that I try to execute the apps on the Android Emulator inside a docker environment for scalability. I'm focusing on the startup process of an app, to see if apps send personal data before the user does any action. I'm aware of the problems with apps detecting the emulator, maybe I can solve a few of them.

I also try to solve the certificate pinning problem with automated app modification with Frida. Not sure if it'll work out to modify apps in a generic way.

The good thing is that I have the rights on my thesis and can share information with you. Sadly I just started to write the thesis and don't have that much solutions yet. =(

I'd love to help or to participate to your project in any way. Maybe we can support each other and share informations about solving main problems.

Kind regards, M

Zitat

Hi M,

that sounds great, we are really looking for people who understand the issue. So you are very welcome to participate. We have a git-repository by now, at git.app-check.org, if you want to have a look (not so much there yet..) Are you located anywhere near Berlin, so we could meet? Otherwise, maybe we can organize a call so I can tell you more about where we are standing right now and how our schedule looks like?

Best Miriam