You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

286 lines
10 KiB

  1. # Auto generated tweets from Markov chains
  2. *2017-12-29*
  3. The main goal of this article is to explain how to develop a twitter botnet with autonomous bots replying tweets with text generated based on probabilities in Markov chains. As this is a project to learn, we have made everything from scratch.
  4. The idea of mixing twitter bots with Markov chains was in a Twitter conversation with [@x0rz](https://twitter.com/x0rz)
  5. ## 1. Markov chains
  6. A Markov chain is a sequence of stochastic events (based on probabilities) where the current state of a variable or system is independent of all past states, except the current state.
  7. https://en.wikipedia.org/wiki/Markov_chain
  8. In our case, we will use Markov chains to analyze the probabilities that after a word comes another concrete word. So, we will generate some diagram like the following one, but with thousands of words.
  9. ![flock-botnet](img/posts/flock-botnet/markovchain.png "Markov chain")
  10. In our case, we need as input some text document with thousands of words, to get a better input data. In this example we have made it with the book "The Critique of Pure Reason", by Immanuel Kant (http://www.gutenberg.org/cache/epub/4280/pg4280.txt), just because is the first book that we have found in .txt format.
  11. ### 1.1 Calculating the Markov chains
  12. First we need to read the text file:
  13. ```go
  14. func readTxt(path string) (string, error) {
  15. data, err := ioutil.ReadFile(path)
  16. if err != nil {
  17. //Do something
  18. }
  19. dataClean := strings.Replace(string(data), "\n", " ", -1)
  20. content := string(dataClean)
  21. return content, err
  22. }
  23. ```
  24. To calculate the probabilities of the Markov states, we have made the following function that analyzes the full input text, and stores the Markov states:
  25. ```go
  26. func calcMarkovStates(words []string) []State {
  27. var states []State
  28. //count words
  29. for i := 0; i < len(words)-1; i++ {
  30. var iState int
  31. states, iState = addWordToStates(states, words[i])
  32. if iState < len(words) {
  33. states[iState].NextStates, _ = addWordToStates(states[iState].NextStates, words[i+1])
  34. }
  35. printLoading(i, len(words))
  36. }
  37. //count prob
  38. for i := 0; i < len(states); i++ {
  39. states[i].Prob = (float64(states[i].Count) / float64(len(words)) * 100)
  40. for j := 0; j < len(states[i].NextStates); j++ {
  41. states[i].NextStates[j].Prob = (float64(states[i].NextStates[j].Count) / float64(len(words)) * 100)
  42. }
  43. }
  44. fmt.Println("\ntotal words computed: " + strconv.Itoa(len(words)))
  45. return states
  46. }
  47. ```
  48. The *printLoading* function, is just a simple function to print in the terminal the % of the process done:
  49. ```go
  50. func printLoading(n int, total int) {
  51. var bar []string
  52. tantPerFourty := int((float64(n) / float64(total)) * 40)
  53. tantPerCent := int((float64(n) / float64(total)) * 100)
  54. for i := 0; i < tantPerFourty; i++ {
  55. bar = append(bar, "█")
  56. }
  57. progressBar := strings.Join(bar, "")
  58. fmt.Printf("\r " + progressBar + " - " + strconv.Itoa(tantPerCent) + "")
  59. }
  60. ```
  61. ![flock-botnet](img/posts/flock-botnet/progressbarMarkov.gif "Markov chain")
  62. ### 1.2 Generating text from the Markov chains
  63. To generate the text, we will need a initializer word and the length of the output text to generate. Then, we perform a loop and get words based on the Markov chains probabilities calculated in the previous step.
  64. ```go
  65. func (markov Markov) generateText(states []State, initWord string, count int) string {
  66. var generatedText []string
  67. word := initWord
  68. generatedText = append(generatedText, word)
  69. for i := 0; i < count; i++ {
  70. word = getNextMarkovState(states, word)
  71. if word == "word no exist on the memory" {
  72. return "word no exist on the memory"
  73. }
  74. generatedText = append(generatedText, word)
  75. }
  76. text := strings.Join(generatedText, " ")
  77. return text
  78. }
  79. ```
  80. To generate the text we need a function that given the markov chains and a word, returns a random probability based word to be the next word following the given word:
  81. ```go
  82. func getNextMarkovState(states []State, word string) string {
  83. iState := -1
  84. for i := 0; i < len(states); i++ {
  85. if states[i].Word == word {
  86. iState = i
  87. }
  88. }
  89. if iState < 0 {
  90. return "word no exist on the memory"
  91. }
  92. var next State
  93. next = states[iState].NextStates[0]
  94. next.Prob = rand.Float64() * states[iState].Prob
  95. for i := 0; i < len(states[iState].NextStates); i++ {
  96. if (rand.Float64()*states[iState].NextStates[i].Prob) > next.Prob && states[iState-1].Word != states[iState].NextStates[i].Word {
  97. next = states[iState].NextStates[i]
  98. }
  99. }
  100. return next.Word
  101. }
  102. ```
  103. ## 2. Twitter API
  104. To interact with the Twitter API, we will use **go-twitter** library https://github.com/dghubble/go-twitter .
  105. We setup a streaming connection with the Twitter API, we will filter tweets by some words related to our input dataset:
  106. ```go
  107. func startStreaming(states []State, flock Flock, flockUser *twitter.Client, botScreenName string, keywords []string) {
  108. // Convenience Demux demultiplexed stream messages
  109. demux := twitter.NewSwitchDemux()
  110. demux.Tweet = func(tweet *twitter.Tweet) {
  111. if isRT(tweet) == false && isFromBot(flock, tweet) == false {
  112. processTweet(states, flockUser, botScreenName, keywords, tweet)
  113. }
  114. }
  115. demux.DM = func(dm *twitter.DirectMessage) {
  116. fmt.Println(dm.SenderID)
  117. }
  118. demux.Event = func(event *twitter.Event) {
  119. fmt.Printf("%#v\n", event)
  120. }
  121. fmt.Println("Starting Stream...")
  122. // FILTER
  123. filterParams := &twitter.StreamFilterParams{
  124. Track: keywords,
  125. StallWarnings: twitter.Bool(true),
  126. }
  127. stream, err := flockUser.Streams.Filter(filterParams)
  128. if err != nil {
  129. log.Fatal(err)
  130. }
  131. // Receive messages until stopped or stream quits
  132. demux.HandleChan(stream.Messages)
  133. }
  134. ```
  135. Then, each time that a new tweet with some of our tracking words is tweeted, we process that tweet and then we generate a reply based on the Markov chains, and we post that reply:
  136. ```go
  137. func processTweet(states []State, flockUser *twitter.Client, botScreenName string, keywords []string, tweet *twitter.Tweet) {
  138. c.Yellow("bot @" + botScreenName + " - New tweet detected:")
  139. fmt.Println(tweet.Text)
  140. tweetWords := strings.Split(tweet.Text, " ")
  141. generatedText := "word no exist on the memory"
  142. for i := 0; i < len(tweetWords) && generatedText == "word no exist on the memory"; i++ {
  143. fmt.Println(strconv.Itoa(i) + " - " + tweetWords[i])
  144. generatedText = generateMarkovResponse(states, tweetWords[i])
  145. }
  146. c.Yellow("bot @" + botScreenName + " posting response")
  147. fmt.Println(tweet.ID)
  148. replyTweet(flockUser, "@"+tweet.User.ScreenName+" "+generatedText, tweet.ID)
  149. waitTime(1)
  150. }
  151. ```
  152. ```go
  153. func replyTweet(client *twitter.Client, text string, inReplyToStatusID int64) {
  154. tweet, httpResp, err := client.Statuses.Update(text, &twitter.StatusUpdateParams{
  155. InReplyToStatusID: inReplyToStatusID,
  156. })
  157. if err != nil {
  158. fmt.Println(err)
  159. }
  160. if httpResp.Status != "200 OK" {
  161. c.Red("error: " + httpResp.Status)
  162. c.Purple("maybe twitter has blocked the account, CTRL+C, wait 15 minutes and try again")
  163. }
  164. fmt.Print("tweet posted: ")
  165. c.Green(tweet.Text)
  166. }
  167. ```
  168. ## 3. Flock-Botnet, or how to avoid the Twitter API limitations
  169. If you ever played with the Twitter API, you will have seen that there are some restrictions and limitations. That means that if your bot have too much posting activity, the account will get blocked for some minutes.
  170. To avoid this limitation, we will deploy a botnet, where each bot will be replying tweets based on the Markov chains probabilities. In this way, when a bot post a tweet reply, the bot falls asleep for 1 minute. In the meantime, the other bots will be processing and replying the other tweets.
  171. ![flock-botnet](img/posts/flock-botnet/flock-botnet-scheme.png "01")
  172. ## 3. Putting it all together
  173. In this demo, we will use only 3 bots (twitter accounts).
  174. The botnet configuration will be in the config.json file:
  175. ```json
  176. [{
  177. "title": "bot1",
  178. "consumer_key": "xxxxxxxxxxxxx",
  179. "consumer_secret": "xxxxxxxxxxxxx",
  180. "access_token_key": "xxxxxxxxxxxxx",
  181. "access_token_secret": "xxxxxxxxxxxxx"
  182. },
  183. {
  184. "title": "bot2",
  185. "consumer_key": "xxxxxxxxxxxxx",
  186. "consumer_secret": "xxxxxxxxxxxxx",
  187. "access_token_key": "xxxxxxxxxxxxx",
  188. "access_token_secret": "xxxxxxxxxxxxx"
  189. },
  190. {
  191. "title": "bot3",
  192. "consumer_key": "xxxxxxxxxxxxx",
  193. "consumer_secret": "xxxxxxxxxxxxx",
  194. "access_token_key": "xxxxxxxxxxxxx",
  195. "access_token_secret": "xxxxxxxxxxxxx"
  196. }
  197. ]
  198. ```
  199. The complete process will be:
  200. ![flock-botnet](img/posts/flock-botnet/steps.png "01")
  201. ## 4. Demo
  202. We have setted up a small demo with 3 bots. As we have said in the beging of this post, we have used the [“The Critique of Pure Reason”, by Immanuel Kant ](http://www.gutenberg.org/cache/epub/4280/pg4280.txt) to generate the Markov chains.
  203. When the botnet is up working, the bots start streaming all the twitter new tweets containing the configured keywords.
  204. Each bot takes a tweet, analyzes the containing words, and generates a reply using the Markov chains previously calculated, and posts the tweet as reply.
  205. Example of terminal view during the flock-botnet execution:
  206. ![flock-botnet](img/posts/flock-botnet/terminal00.png "01")
  207. Here is an example of the execution:
  208. ![flock-botnet](img/posts/flock-botnet/flock-botnet-demo.gif "Markov chain")
  209. In the following examples, there are some screenshots that the bots ("@andreimarkov", "@dodecahedron", "@projectNSA") have been replying to some people.
  210. ![flock-botnet](img/posts/flock-botnet/01.png "01")
  211. ---
  212. ![flock-botnet](img/posts/flock-botnet/02.jpeg "02")
  213. ---
  214. ![flock-botnet](img/posts/flock-botnet/03.jpeg "03")
  215. ---
  216. ![flock-botnet](img/posts/flock-botnet/04.jpeg "04")
  217. ## Conclusion
  218. In this article, we have seen how to build a Twitter botnet with the bots replying tweets with text generated based on Markov chains.
  219. As in this article we have used only 1 grade Markov chains, so the generated text is not really like humans text. But for future projects, a good choice would be combining more grades Markov chains with other text mining techniques.
  220. Twitter API have lots of uses, and in this post we have seen one of them. I hope to be able to write some more articles about other projects arround the Twitter API. For example some Twitter [network nodes analysis](https://devpost.com/software/projectnsa), or some [users & hashtags analysis](https://arnaucube.com/hashtagsUsersNetworkPage.html).
  221. The complete code of this project is able in https://github.com/arnaucube/flock-botnet
  222. Project page: http://arnaucube.com/flock-botnet/