You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

365 lines
16 KiB

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="description" content="Webpage description goes here" />
<meta charset="utf-8">
<title>Auto generated tweets from Markov chains - ArnauCube - Blog</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
<link rel="stylesheet" href="css/style.css">
<!-- highlightjs -->
<!-- <link rel="stylesheet" href="js/highlightjs/atom-one-dark.css"> -->
<link rel="stylesheet" href="js/highlightjs/gruvbox-dark.css">
<script src="js/highlightjs/highlight.pack.js"></script>
<!-- katex -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.11/dist/katex.min.css" integrity="sha384-Um5gpz1odJg5Z4HAmzPtgZKdTBHZdw8S29IecapCSB31ligYPhHQZMIlWLYQGVoc" crossorigin="anonymous">
</head>
<body>
<!-- o_gradient_background" -->
<nav id="mainNav" class="navbar navbar-default navbar-fixed-top"
style="height:50px;font-size:130%;">
<div class="container">
<a href="/blog" style="color:#000;">Blog index</a>
<a href="/" style="color:#000;float:right;">arnaucube.com</a>
</div>
</nav>
<div class="o_gradient_background" style="height:5px;"></div>
<div class="container" style="margin-top:80px;max-width:800px;">
<h1>Auto generated tweets from Markov chains</h1>
<p><em>2017-12-29</em></p>
<p>The main goal of this article is to explain how to develop a twitter botnet with autonomous bots replying tweets with text generated based on probabilities in Markov chains. As this is a project to learn, we have made everything from scratch.</p>
<p>The idea of mixing twitter bots with Markov chains was in a Twitter conversation with <a href="https://twitter.com/x0rz">@x0rz</a></p>
<h2>1. Markov chains</h2>
<p>A Markov chain is a sequence of stochastic events (based on probabilities) where the current state of a variable or system is independent of all past states, except the current state.</p>
<p><a href="https://en.wikipedia.org/wiki/Markov_chain">https://en.wikipedia.org/wiki/Markov_chain</a></p>
<p>In our case, we will use Markov chains to analyze the probabilities that after a word comes another concrete word. So, we will generate some diagram like the following one, but with thousands of words.</p>
<p><img src="img/posts/flock-botnet/markovchain.png" alt="flock-botnet" title="Markov chain" /></p>
<p>In our case, we need as input some text document with thousands of words, to get a better input data. In this example we have made it with the book &ldquo;The Critique of Pure Reason&rdquo;, by Immanuel Kant (<a href="http://www.gutenberg.org/cache/epub/4280/pg4280.txt">http://www.gutenberg.org/cache/epub/4280/pg4280.txt</a>), just because is the first book that we have found in .txt format.</p>
<h3>1.1 Calculating the Markov chains</h3>
<p>First we need to read the text file:</p>
<pre><code class="language-go">func readTxt(path string) (string, error) {
data, err := ioutil.ReadFile(path)
if err != nil {
//Do something
}
dataClean := strings.Replace(string(data), &quot;\n&quot;, &quot; &quot;, -1)
content := string(dataClean)
return content, err
}
</code></pre>
<p>To calculate the probabilities of the Markov states, we have made the following function that analyzes the full input text, and stores the Markov states:</p>
<pre><code class="language-go">func calcMarkovStates(words []string) []State {
var states []State
//count words
for i := 0; i &lt; len(words)-1; i++ {
var iState int
states, iState = addWordToStates(states, words[i])
if iState &lt; len(words) {
states[iState].NextStates, _ = addWordToStates(states[iState].NextStates, words[i+1])
}
printLoading(i, len(words))
}
//count prob
for i := 0; i &lt; len(states); i++ {
states[i].Prob = (float64(states[i].Count) / float64(len(words)) * 100)
for j := 0; j &lt; len(states[i].NextStates); j++ {
states[i].NextStates[j].Prob = (float64(states[i].NextStates[j].Count) / float64(len(words)) * 100)
}
}
fmt.Println(&quot;\ntotal words computed: &quot; + strconv.Itoa(len(words)))
return states
}
</code></pre>
<p>The <em>printLoading</em> function, is just a simple function to print in the terminal the % of the process done:</p>
<pre><code class="language-go">func printLoading(n int, total int) {
var bar []string
tantPerFourty := int((float64(n) / float64(total)) * 40)
tantPerCent := int((float64(n) / float64(total)) * 100)
for i := 0; i &lt; tantPerFourty; i++ {
bar = append(bar, &quot;&quot;)
}
progressBar := strings.Join(bar, &quot;&quot;)
fmt.Printf(&quot;\r &quot; + progressBar + &quot; - &quot; + strconv.Itoa(tantPerCent) + &quot;&quot;)
}
</code></pre>
<p><img src="img/posts/flock-botnet/progressbarMarkov.gif" alt="flock-botnet" title="Markov chain" /></p>
<h3>1.2 Generating text from the Markov chains</h3>
<p>To generate the text, we will need a initializer word and the length of the output text to generate. Then, we perform a loop and get words based on the Markov chains probabilities calculated in the previous step.</p>
<pre><code class="language-go">func (markov Markov) generateText(states []State, initWord string, count int) string {
var generatedText []string
word := initWord
generatedText = append(generatedText, word)
for i := 0; i &lt; count; i++ {
word = getNextMarkovState(states, word)
if word == &quot;word no exist on the memory&quot; {
return &quot;word no exist on the memory&quot;
}
generatedText = append(generatedText, word)
}
text := strings.Join(generatedText, &quot; &quot;)
return text
}
</code></pre>
<p>To generate the text we need a function that given the markov chains and a word, returns a random probability based word to be the next word following the given word:</p>
<pre><code class="language-go">func getNextMarkovState(states []State, word string) string {
iState := -1
for i := 0; i &lt; len(states); i++ {
if states[i].Word == word {
iState = i
}
}
if iState &lt; 0 {
return &quot;word no exist on the memory&quot;
}
var next State
next = states[iState].NextStates[0]
next.Prob = rand.Float64() * states[iState].Prob
for i := 0; i &lt; len(states[iState].NextStates); i++ {
if (rand.Float64()*states[iState].NextStates[i].Prob) &gt; next.Prob &amp;&amp; states[iState-1].Word != states[iState].NextStates[i].Word {
next = states[iState].NextStates[i]
}
}
return next.Word
}
</code></pre>
<h2>2. Twitter API</h2>
<p>To interact with the Twitter API, we will use <strong>go-twitter</strong> library <a href="https://github.com/dghubble/go-twitter">https://github.com/dghubble/go-twitter</a> .</p>
<p>We setup a streaming connection with the Twitter API, we will filter tweets by some words related to our input dataset:</p>
<pre><code class="language-go">func startStreaming(states []State, flock Flock, flockUser *twitter.Client, botScreenName string, keywords []string) {
// Convenience Demux demultiplexed stream messages
demux := twitter.NewSwitchDemux()
demux.Tweet = func(tweet *twitter.Tweet) {
if isRT(tweet) == false &amp;&amp; isFromBot(flock, tweet) == false {
processTweet(states, flockUser, botScreenName, keywords, tweet)
}
}
demux.DM = func(dm *twitter.DirectMessage) {
fmt.Println(dm.SenderID)
}
demux.Event = func(event *twitter.Event) {
fmt.Printf(&quot;%#v\n&quot;, event)
}
fmt.Println(&quot;Starting Stream...&quot;)
// FILTER
filterParams := &amp;twitter.StreamFilterParams{
Track: keywords,
StallWarnings: twitter.Bool(true),
}
stream, err := flockUser.Streams.Filter(filterParams)
if err != nil {
log.Fatal(err)
}
// Receive messages until stopped or stream quits
demux.HandleChan(stream.Messages)
}
</code></pre>
<p>Then, each time that a new tweet with some of our tracking words is tweeted, we process that tweet and then we generate a reply based on the Markov chains, and we post that reply:</p>
<pre><code class="language-go">func processTweet(states []State, flockUser *twitter.Client, botScreenName string, keywords []string, tweet *twitter.Tweet) {
c.Yellow(&quot;bot @&quot; + botScreenName + &quot; - New tweet detected:&quot;)
fmt.Println(tweet.Text)
tweetWords := strings.Split(tweet.Text, &quot; &quot;)
generatedText := &quot;word no exist on the memory&quot;
for i := 0; i &lt; len(tweetWords) &amp;&amp; generatedText == &quot;word no exist on the memory&quot;; i++ {
fmt.Println(strconv.Itoa(i) + &quot; - &quot; + tweetWords[i])
generatedText = generateMarkovResponse(states, tweetWords[i])
}
c.Yellow(&quot;bot @&quot; + botScreenName + &quot; posting response&quot;)
fmt.Println(tweet.ID)
replyTweet(flockUser, &quot;@&quot;+tweet.User.ScreenName+&quot; &quot;+generatedText, tweet.ID)
waitTime(1)
}
</code></pre>
<pre><code class="language-go">func replyTweet(client *twitter.Client, text string, inReplyToStatusID int64) {
tweet, httpResp, err := client.Statuses.Update(text, &amp;twitter.StatusUpdateParams{
InReplyToStatusID: inReplyToStatusID,
})
if err != nil {
fmt.Println(err)
}
if httpResp.Status != &quot;200 OK&quot; {
c.Red(&quot;error: &quot; + httpResp.Status)
c.Purple(&quot;maybe twitter has blocked the account, CTRL+C, wait 15 minutes and try again&quot;)
}
fmt.Print(&quot;tweet posted: &quot;)
c.Green(tweet.Text)
}
</code></pre>
<h2>3. Flock-Botnet, or how to avoid the Twitter API limitations</h2>
<p>If you ever played with the Twitter API, you will have seen that there are some restrictions and limitations. That means that if your bot have too much posting activity, the account will get blocked for some minutes.</p>
<p>To avoid this limitation, we will deploy a botnet, where each bot will be replying tweets based on the Markov chains probabilities. In this way, when a bot post a tweet reply, the bot falls asleep for 1 minute. In the meantime, the other bots will be processing and replying the other tweets.</p>
<p><img src="img/posts/flock-botnet/flock-botnet-scheme.png" alt="flock-botnet" title="01" /></p>
<h2>3. Putting it all together</h2>
<p>In this demo, we will use only 3 bots (twitter accounts).</p>
<p>The botnet configuration will be in the config.json file:</p>
<pre><code class="language-json">[{
&quot;title&quot;: &quot;bot1&quot;,
&quot;consumer_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;consumer_secret&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_secret&quot;: &quot;xxxxxxxxxxxxx&quot;
},
{
&quot;title&quot;: &quot;bot2&quot;,
&quot;consumer_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;consumer_secret&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_secret&quot;: &quot;xxxxxxxxxxxxx&quot;
},
{
&quot;title&quot;: &quot;bot3&quot;,
&quot;consumer_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;consumer_secret&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_key&quot;: &quot;xxxxxxxxxxxxx&quot;,
&quot;access_token_secret&quot;: &quot;xxxxxxxxxxxxx&quot;
}
]
</code></pre>
<p>The complete process will be:</p>
<p><img src="img/posts/flock-botnet/steps.png" alt="flock-botnet" title="01" /></p>
<h2>4. Demo</h2>
<p>We have setted up a small demo with 3 bots. As we have said in the beging of this post, we have used the <a href="http://www.gutenberg.org/cache/epub/4280/pg4280.txt">“The Critique of Pure Reason”, by Immanuel Kant </a> to generate the Markov chains.</p>
<p>When the botnet is up working, the bots start streaming all the twitter new tweets containing the configured keywords.</p>
<p>Each bot takes a tweet, analyzes the containing words, and generates a reply using the Markov chains previously calculated, and posts the tweet as reply.</p>
<p>Example of terminal view during the flock-botnet execution:</p>
<p><img src="img/posts/flock-botnet/terminal00.png" alt="flock-botnet" title="01" /></p>
<p>Here is an example of the execution:</p>
<p><img src="img/posts/flock-botnet/flock-botnet-demo.gif" alt="flock-botnet" title="Markov chain" /></p>
<p>In the following examples, there are some screenshots that the bots (&ldquo;@andreimarkov&rdquo;, &ldquo;@dodecahedron&rdquo;, &ldquo;@projectNSA&rdquo;) have been replying to some people.</p>
<p><img src="img/posts/flock-botnet/01.png" alt="flock-botnet" title="01" /></p>
<hr>
<p><img src="img/posts/flock-botnet/02.jpeg" alt="flock-botnet" title="02" /></p>
<hr>
<p><img src="img/posts/flock-botnet/03.jpeg" alt="flock-botnet" title="03" /></p>
<hr>
<p><img src="img/posts/flock-botnet/04.jpeg" alt="flock-botnet" title="04" /></p>
<h2>Conclusion</h2>
<p>In this article, we have seen how to build a Twitter botnet with the bots replying tweets with text generated based on Markov chains.</p>
<p>As in this article we have used only 1 grade Markov chains, so the generated text is not really like humans text. But for future projects, a good choice would be combining more grades Markov chains with other text mining techniques.</p>
<p>Twitter API have lots of uses, and in this post we have seen one of them. I hope to be able to write some more articles about other projects arround the Twitter API. For example some Twitter <a href="https://devpost.com/software/projectnsa">network nodes analysis</a>, or some <a href="https://arnaucube.com/hashtagsUsersNetworkPage.html">users &amp; hashtags analysis</a>.</p>
<p>The complete code of this project is able in <a href="https://github.com/arnaucube/flock-botnet">https://github.com/arnaucube/flock-botnet</a></p>
<p>Project page: <a href="http://arnaucube.com/flock-botnet/">http://arnaucube.com/flock-botnet/</a></p>
</div>
<footer style="text-align:center; margin-top:100px;margin-bottom:50px;">
<div class="container">
<div class="row">
<ul class="list-inline">
<li><a href="https://twitter.com/arnaucube"
style="color:gray;text-decoration:none;"
target="_blank">twitter.com/arnaucube</a>
</li>
<li><a href="https://github.com/arnaucube"
style="color:gray;text-decoration:none;"
target="_blank">github.com/arnaucube</a>
</li>
</ul>
</div>
<div class="row" style="display:inline-block;">
Blog made with <a href="http://github.com/arnaucube/blogo/"
target="_blank" style="color: gray;text-decoration:none;">Blogo</a>
</div>
</div>
</footer>
<script>
</script>
<script src="js/external-links.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.13.11/dist/katex.min.js" integrity="sha384-YNHdsYkH6gMx9y3mRkmcJ2mFUjTd0qNQQvY9VYZgQd7DcN7env35GzlmFaZ23JGp" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.13.11/dist/contrib/auto-render.min.js" integrity="sha384-vZTG03m+2yp6N6BNi5iM4rW4oIwk5DfcNdFfxkk9ZWpDriOkXX8voJBFrAO7MpVl" crossorigin="anonymous"></script>
<script>
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.body, {
displayMode: false,
// customised options
// • auto-render specific keys, e.g.:
delimiters: [
{left: '$$', right: '$$', display: true},
{left: '$', right: '$', display: false},
],
// • rendering keys, e.g.:
throwOnError : true
});
});
</script>
</body>
</html>