Intelligence Information Gathering – Collecting Twitter Followers with 25 lines of Python
Introduction
Many corporations are not aware of the types of data that can be found and used by attackers in the wild. The information that you will be able to find will vary from target to target, but will typically include items such as IP ranges, domain names, e-mail addresses, public financial data, organizational information, technologies used, job titles, phone numbers, usernames and much more. The primary goal of the passive gathering stage is to gather as much actionable data as possible while at the same time leaving few or none indicators that anyone has searched for the data. It takes time and patience to sort through web pages, perform Google hacking, and map systems thoroughly in an attempt to understand the infrastructure of a particular target.
In article let's assume that we have a task to perform a penetration test for online banking system to verify the ability to guess valid usernames and passwords. If you were a hacker what would you do?
What should you learn next?
Speaking for myself, first I would write up a quick script to create a dictionary file for potential usernames, secondly, find out the company password policy (like password length, the number of special character and so on..), and based on that, I will build my own password dictionary file. Finally, automate the process to see if we can get a correct password or maybe perform a DoS and block the account after X numbers of failed attempts!!
Many users are using the same username for their bank account, Facebook, Twitter, and other social media. So let's forge a small Python script to illustrate how an attacker could use an ordinary publicly available information and build up a dictionary file which contains Twitter followers for XYZ Bank. At the time of writing this article, XYZ Bank has around 24,027 followers, let's bring them up!
**Disclaimer: all of the actions explained in this article are counted under Passive Information Gathering and considered legitimate. We just spotlight a smart way of data collection.**
Build your own dictionary file
Twitter and many social websites have something called API < Application Programming Interface > which allows a programmer to write his own code to interact with Twitter and Get/Post information from/to Twitter. Fortunately, we have many libraries in Python that makes my job much easier, so all that I need to do is to register in Twitter developers and use the developer ID/keys in my script to run. The registration process should be something similar to these snapshots: -
Tweepy is a Python third-party library allow us to parse Twitter's data; installing Tweepy is pretty easy:-
hkhrais@Hkhrais:~$ sudo apt-get install python-pip
hkhrais@Hkhrais:~$ sudo pip install tweepy
Source Code
import tweepy
import time
#insert your Twitter keys here
consumer_key ='blah blah blah'
consumer_secret='blah blah blah'
access_token='blah blah blah'
access_secret='blah blah blah'
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
list= open('/home/hkhrais/Desktop/list.txt','w')
if(api.verify_credentials):
print 'We su
user = tweepy.Cursor(api.followers, screen_name="XYZbankgroup").items()
while True:
try:
u = next(user)
list.write(u.screen_name +' n')
except:
time.sleep(15*60)
print 'We got a timeout ... Sleeping for 15 minutes'
u = next(user)
list.write(u.screen_name +' n')
list.close()
The code is almost self-explanatory, I passed consumer/token keys to function "OauthHandler" to identify/authenticate myself to Twitter, after that I asked to get followers ID for 'XYZbankgroup' and store it in variable "user".
According to Twitter development paper, there's a limit for how many requests a program can ask, in case of getting followers ID we should wait around 15 minutes otherwise a limit excess exception will show up.
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]
Execution Output
hkhrais@Hkhrais:~/Desktop/Tweets$ sudo python Twitter.py
[sudo] password for hkhrais:
We successfully logged in
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
...
We got a timeout ... Sleeping for 15 minutes
Traceback (most recent call last):
File "Twitter.py", line 31, in <module>
u = next(user)
File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 181, in next
self.current_page = self.page_iterator.next()
File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 64, in next
raise StopIteration
StopIteration
hkhrais@Hkhrais:~/Desktop/Tweets$
Note that the last exception indicates iteration completion, which means we've grabbed the whole followers' usernames :)
The result: -
What should you learn next?
Conclusion
Intelligence gathering requires careful planning, research, and, most importantly, the ability to think like an attacker. With a small Python script (around 25 lines), we could retrieve a 24,027 followers' usernames for @XYZbankgroup which can be used as good dictionary usernames. Keep in mind that this script gets very handy especially if our target usernames are not English!
References
- Twitter API https://dev.twitter.com/docs/twitter-libraries
- Tweepy library https://pypi.python.org/pypi/tweepy/