Archive for the ‘My Terrible Coding’ category

iMacros Firefox wrapped with PHP, Linux, VirtualBox

October 30th, 2009

NOTE: Please please please feel free to leave comments criticizing anything I write here or if you have any questions! If it can help me learn to do something better I’m all for it.

So, as much as I love using PHP and CURL, I’ve finally graduated to none other than iMacros because a majority of my bigger projects have started to require me to read pages which contain necessary javascript to view. Reverse engineering javascript is fun, when it works, but wasting valuable time reversing it doesn’t make sense.

If you haven’t heard of iMacros it’s a super powerful macro scripting program made specifically to automate web browser sessions. Now before you start thinking to yourself macros are for newbs who don’t know how to code, trust me this is far from newb. iMacros on its own is very newb, mixing iMacros + PHP/MySQL + other scripting = insanely complex.

My setup is this:

Fedora Core 11 w/ VirtualBox installed (a free VMware type application), and a WinXP Pro Virtual Machine w/ iMacros scripting edition installed along with the iMacros Firefox Plug-in.

The reasoning behind running Linux and a virtual machine is this, if a WinXP virtual machine starts leaking memory/freezes/dies, I can simply close the virtual machine and reopen it via Linux command line. I’m an avid fan of making sure stuff can run unattended for months. Although it makes things a bit more complex, it’s well worth the time investment in doing it right.

With that said, I plan to slowly add my entire setup to this post, along with some more complex coding that will allow you to multi-thread iMacros sessions.

iMacros + Firefox Plugin
The reasoning behind using the iMacros Firefox Plug-in versus the iMacros Browser is that the current iMacros browser doesn’t support threaded sessions. What this means is that the same cookies, cache, proxy settings will be used across every iMacros browser session. This is no good if you want to log in to 2 different accounts at once using 2 different proxy IPs. Firefox however allows you to create different browser profiles (just run ‘firefox -ProfileManager’ with firefox already closed to view the profile manager), and iMacros allows you to choose different profiles during Firefox’s execution. As a result you get browser sessions that are entirely different from one another.

Setting up WAMPServer.

1) Download and install WAMP Server at http://www.wampserver.com/

2) Setup the command PHP to execute the WAMP server version of PHP. You won’t need Apache, I just like WampServers version of the php config which already has most of the plugins I need installed which saves a lot of headache. To make the PHP command work on Windows Vista click the start button, right click Computer and go to Properties. Click advanced system settings, go to the Advanced tab, click environment variables at the bottom, under System Variables go to the variable PATH and add the PATH to the folder that contains the php.exe (my path is: ‘C:\wamp\bin\php\php5.3.0′;)

Setting up Firefox –

First off you’re going to need to download the Firefox version of iMacros here.

To make Firefox and iMacros run most efficiently I highly recommend you do several things.

1) Disable Flash – iMacros seems to have page loading issues when Flash is involved. Since you can’t play with flash in the iMacros Firefox Plug-in anyway, lets disable flash by going to Tools/Add-ons clicking the “Plugins” tab and going down to ‘Shockwave Flash’ and disabling it.

2) Install AdBlock Plus - why spend bandwidth and CPU processes running ads you’ll never see as well as get advertisers charged for those ads. Secondly, AdBlock Plus will allow you to block certain sites or scripts from loading that you specify. This again saves valuable resources and sometimes is necessary since iMacros still appears a bit buggy when trying to figure out whether a page is done loading or not.

3) Disable loading images automatically. 99% of the time there is no point to loading images either unless your system is some sort of image scraper. Although iMacros has an image removal function (‘FILTER TYPE=IMAGES STATUS=(ON|OFF)’) I prefer disabling images via Firefox instead. I highly recommend disabling images in Firefox by going to Tools/Options clicking the ‘Content’ tab and removing the checkmark on ‘Load images automatically’. If you have images that might be an exception, say a captcha image, you can simply click the ‘Exceptions’ button and add the url of the exception. I’ve also included an iMacro in the package above that will allow you to disable and re-enable images in Firefox.

4) Make every browser session you start a private browsing session. You can do this by going to: Tools/Options/Privacy in Firefox and then choose in the drop down “Firefox will never remember history”. The reasoning for this is that I’ve found the iMacros “CLEAR” command is unreliable.

The scripts/PHP code I’ve written:

iMacros Firefox Proxy  Setting Script (taken from: http://thepemberton.com/posts/archives/23 and modified a little bit)

To use it, simply replace {{IP}} and {{PORT}} with the IP and port of your proxy.

iMacros AdBlock Plus enable/disable script

To use it, simply install ABP and then modify {{status}} to ‘true’ or ‘false’ to enable or disable it.

iMacros Firefox images enable/disable script

To use it simply replace {{images}} to 1 or 2 to enable or disable images in Firefox.

iimfx.class.php

This file allows you to run iMacros from PHP with ease providing your running the iimRunner.exe. I haven’t documented it very well, but if you take a look through it you should be able to easily figure it out. Any updates, changes you see fit, please let me know, I’d love to improve on it. You’ll also need the 3 previous scripts above to use it appropriately as it’s customized for my setup.

The code of iimfx.class.php

<?php
 
class imacros {
	function __construct($proxyip = '', $proxyport = '', $silent = false, $noexit = false) {
 
		echo "--------------------------------------\nNew imacros session started!\nUsing Proxy: $proxyip:$proxyport\n";
		$this->proxyip = $proxyip;
		$this->proxyport = $proxyport;
 
		if (empty ( $this->proxyip ))
			echo "NO PROXY!!\n";
 
		$this->noexit = $noexit;
		$this->fso = new COM ( 'Scripting.FileSystemObject' );
		$this->fso = NULL;
 
		$this->iim = new COM ( "imacros" );
 
		$toexec = "-runner -fx -fxProfile default";
 
		if ($silent === true)
			$toexec .= " -silent";
 
		if ($noexit === true)
			$toexec .= " -noexit";
 
		echo $toexec . "\n";
 
		$this->iim->iimInit ( $toexec );
 
		if (! empty ( $this->proxyip )) {
			$dvars ['IP'] = $this->proxyip;
			$dvars ['port'] = $this->proxyport;
			$this->play ( $dvars, 'proxy' );
		}
	}
 
	function __destruct() {
		if ($this->noexit === false)
			$this->iim->iimExit ();
	}
 
	function play($immvars = '', $macro) {
 
		echo "--------------------------------------------------------\n";
 
		if (is_array ( $immvars )) {
			foreach ( $immvars as $key => $value ) {
				echo "Setting Value $key => $value\n";
				$this->iim->iimSet ( "-var_" . $key, $value );
			}
		}
 
		echo "Playing Macro $macro\n";
		$s = $this->iim->iimPlay ( $macro );
 
		if($s>0){
			echo "Macro successfully played!\n";
		}else{
			echo "--------MACRO ERROR!-------------------\n ERROR: " . $this->getLastError() . "\n";
		}
		return $s;
	}
 
	// This function retrieves extracts in your iMacros script if you have any. 
	function getLastExtract($num) {
		return $this->iim->iimGetLastExtract ( $num );
	}
 
	// Returns the last error :)
	function getLastError(){
		return $this->iim->iimGetLastError();
	}
 
	// Enables/disables images
	function setImages($images = 1) { // 1 = on 2 = off
 
		$dvars ['images'] = $images;
		$this->play ( $dvars, 'images' );
 
	}
 
	// Enables or disables adblockplus
	function enableABP($status = true){
 
		$dvars['status'] = $status;
		$this->play ( $dvars, 'abp.iim' );
 
	}
 
}
 
?>

Scraping Forms with DOM and PHP for Posting with CURL

September 1st, 2009

I posted in the past about scraping forms for use when you need to make a post using CURL along with a little bit of code. I’ve recently moved on to using DOM and this little function I wrote below. I finally got fed up with sites adding new hidden fields and having to write preg_matches for unique variables that a lot of sites are stuffing into their forms these days to prevent spammers and/or track sessions. 

If you don’t have the plugin already in you need the php-xml plugin. Depending on your setup ‘yum install php-xml’ should do the trick.

$html is the html code of the site you’d get using file_get_contents, CURL or whatever.

$form_number is the number of the form you want – 1. It’s usually ok just to leave this at 0 but sometimes sites have more than 1 form on the page so you have to specify. 

The postData array is returned and ready for posting in CURL, all you need to do is find the few fields you actually need to specify and update those fields within the postData array. It’s usually as easy as $postData['email'] = myemail@emailme.com;

Updated Sept. 1st, 2009. This is my inputs.php. Some input fields I’ve found had some unique text names to make posting data more annoying. To quickly build around this I built a few functions to pull those as well by providing known parameters such as the inputs width, textsize, etc. It’s similar to what iMacros does.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
<?php
function getInputs($html, $form_number = 0) {
 
	$dom = new DOMDocument ( );
	@$dom->loadHTML ( $html ); //@ is there cuz this will throw up a bunch of errors if the html code isn't perfect
 
 
	$forms = $dom->getElementsByTagName ( "form" );
 
	$form = $forms->item ( $form_number );
 
	// Gets input areas and also checks to make sure the form exists
 
 
	if ($form) {
		$inputs = $form->getElementsByTagName ( "input" );
	} else {
		echo "Form does not exist! Line: " . __LINE__ . "\n";
		return '';
	}
 
	foreach ( $inputs as $input ) {
 
		$attval = $input->getAttribute ( 'name' );
 
		if (! empty ( $attval ))
			$postData [$attval] = $input->getAttribute ( 'value' );
 
	}
 
	// Gets textareas
 
 
	$inputs = $form->getElementsByTagName ( "textarea" );
 
	foreach ( $inputs as $input ) {
 
		$attval = $input->getAttribute ( 'name' );
 
		if (! empty ( $attval ))
			$postData [$attval] = $input->nodeValue;
 
	}
 
	// Gets buttons
 
 
	$inputs = $form->getElementsByTagName ( "button" );
 
	foreach ( $inputs as $input ) {
 
		$attval = $input->getAttribute ( 'value' );
 
		if (! empty ( $attval ))
			$postData [$attval] = $input->getAttribute ( 'value' );
 
	}
 
	// Gets selects
 
 
	$inputs = $form->getElementsByTagName ( "select" );
 
	foreach ( $inputs as $input ) {
		$attval = $input->getAttribute ( 'name' );
 
		if (! empty ( $attval ))
			$postData [$attval] = '';
 
	}
 
	return $postData;
 
}
 
function getForms($html) {
 
	$dom = new DOMDocument ( );
	@$dom->loadHTML ( $html );
	$xpath = new DOMXPath ( $dom );
 
	$forms = $xpath->evaluate ( "/html/body//form" );
 
	$returnform = array ();
 
	for($i = 0; $i < $forms->length; $i ++) {
		$form = $forms->item ( $i );
		$returnform [] = $form->getAttribute ( 'action' );
	}
 
	return $returnform;
 
}
 
// Finds unique variables by finding other variables within the form
 
 
function findUniqueInput($html, $variables, $form_number = 0) {
 
	$dom = new DOMDocument ( );
	@$dom->loadHTML ( $html ); //@ is there cuz this will throw up a bunch of errors if the html code isn't perfect
 
 
	$forms = $dom->getElementsByTagName ( "form" );
 
	$form = $forms->item ( $form_number );
 
	// Gets input areas and also checks to make sure the form exists
 
 
	if ($form) {
		$inputs = $form->getElementsByTagName ( "input" );
	} else {
		echo "Form does not exist! Line: " . __LINE__ . "\n";
		return '';
	}
 
	$keys = array_keys ( $variables );
 
	foreach ( $inputs as $input ) {
 
		$good = true;
 
		for($i = 0; $i < count ( $keys ); $i ++) {
 
			if ($input->getAttribute ( $keys [$i] ) == $variables [$keys [$i]]) {
 
			} else {
				$good = false;
			}
		}
 
		if ($good === true) {
			echo $input->getAttribute ( 'name' ) . "\n";
			echo "Found Input!" . "\n";
			return $input->getAttribute ( 'name' );
		}
 
	}
 
}
 
function findUniqueTextArea($html, $variables, $form_number = 0) {
 
	$dom = new DOMDocument ( );
	@$dom->loadHTML ( $html ); //@ is there cuz this will throw up a bunch of errors if the html code isn't perfect
 
 
	$forms = $dom->getElementsByTagName ( "form" );
 
	$form = $forms->item ( $form_number );
 
	// Gets input areas and also checks to make sure the form exists
 
 
	if ($form) {
		$inputs = $form->getElementsByTagName ( "textarea" );
	} else {
		echo "Form does not exist! Line: " . __LINE__ . "\n";
		return '';
	}
 
	$keys = array_keys ( $variables );
 
	foreach ( $inputs as $input ) {
 
		$good = true;
 
		for($i = 0; $i < count ( $keys ); $i ++) {
 
			if ($input->getAttribute ( $keys [$i] ) == $variables [$keys [$i]]) {
 
			} else {
				$good = false;
			}
		}
 
		if ($good === true) {
			echo $input->getAttribute ( 'name' ) . "\n";
			echo "Found TextArea!" . "\n";
			return $input->getAttribute ( 'name' );
		}
 
	}
 
}
 
?>

Example getting a unique input name.

1
2
	$variables = array ("maxlength" => '10', "size" => '10', "tabindex" => '1' ); // the other variables of the text input that you want to retrieve
	$uniqueName = $this->findUniqueInput ( $text, $variables );

function getInputs example

1
2
3
4
5
6
7
$text = $c->getFile("http://www.wordpress.com"); // integrates with my curl class (you can replace this with file_get or something similar depending on what you use.)
                $postData = getInputs ( $text ); // gets the inputs from the html returned
		$postData [login] = $login; // sets the unique variables needed for the post
		$postData [password] = $password;
 
 
$this->curl->getFile("http://www.wordpress.com", $postData); // posts our data to wordpress, p.s. this is just an example and won't actually work with wordpress.

Handling Redirects with PHP/CURL

June 29th, 2009

Ever wanted to see all the different redirect urls a particular link redirects through with PHP/CURL? Now you can with this nifty little piece of code (the only thing I haven’t written yet is a way to handle redirects that don’t contain the hostname). Anyway, with that said, check this out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$url = "http://www.somesiteyouknowuseslocationredirect.com";
 
$ch = curl_init();
 
curl_setopt($ch, CURLOPT_URL, $url);
 
$data = curl_exec($ch);
 
$curl_info = curl_getinfo($ch);
 
$redirect_count = $curl_info['redirect_count'];
 
$header_size = $curl_info['header_size'];
 
$header = substr($data, 0, $header_size);
 
$redirecturls = array();
 
if($redirect_count>0){
 
preg_match_all("/location:(.*?)\n/is", $header, $locations);
 
foreach($locations[1] as $location){;
$redirecturls[] = trim($location);
}
 
}

Quickly creating post arrays for CURL

January 20th, 2009

One thing I’ve spent a lot of time over is breaking up post variables for use in CURL and posting to forms. Sure you have the option of using an array or just replacing a few variables in a string with CURL but sometimes forms can get crazy with the amount of unique variables they have. Missing a simple unique variable or spelling it wrong can result in hours of wasted time trying to figure out why something won’t work. So I created this lil snippet of code to basically help break down a post string into a nice postData array for use in CURL.

1
2
3
4
5
6
7
8
9
10
11
12
13
function formInfo($text){
	$forms = split("&", $text);
 
	foreach($forms as &$form){
 
		$values = split("=", $form);
 
		$check = urldecode($values[1]);
 
		echo '$postData[\'' . urldecode($values[0]) . '\']' . " = '$check';\n";
	}
 
}

So for a random example use, I did it on wordpress’s login system.

1
2
3
$text = "log=exosus&amp;pwd=sexytime&amp;rememberme=forever&amp;testcookie=1&amp;redirect_to=http%3A%2F%2Fwordpress.com%2F&amp;submit=Login";
 
formInfo($text);

which produces the result:

1
2
3
4
5
6
$postData['log'] = 'exosus';
$postData['pwd'] = 'sexytime';
$postData['rememberme'] = 'forever';
$postData['testcookie'] = '1';
$postData['redirect_to'] = 'http://wordpress.com/';
$postData['submit'] = 'Login';

Multi-threading PHP

June 4th, 2008

If you know how to do this better let me know. Basically, when scraping data, I always try to have as many threads going as my bandwidth and server can handle. Scraping 1 page at a time is too damn slow. When I first tried threading I used pcntl_fork. This required me to recompile php with the plugin for it which began to become a pain in the ass. I like making my scripts work with the default php setup simply because if I ever decide to move the script I don’t want to be screwing around with php on the new server. With that said, I now use exec for everything which in my opinion works better anyway.

I have 2 files, the first is threadstart.php:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?php
 
$totalprocesses = 25; // How many threads you want to run at once
 
$start = 1;
 
$processes = 0;
while($start > 0){
 
echo "Checking $start\n";
 
exec("(/usr/bin/php -q thread.php $start >  /home/logs/thread.log) & /dev/null &");
$processes++;
$start++;
while($processes > ($totalprocesses - 1)){
$text = shell_exec("ps -Af | grep php");
 
preg_match_all('/thread.php/', $text, $matches);
 
$processes = count($matches[0]);
 
sleep(2); // You can change this to however fast you want to check if a thread is still going
}
 
}
 
?>

thread.php

1
2
3
4
5
<?php
 
$argv[1]; // this is the $start variable you passed in the threadstart.php. Simply use this ID to do what you want.
 
?>

Prosper202 Pre-Pop Data Workaround

April 28th, 2008

Since this post has actually made it up on tracking202.com as an example, plenty of people I’m sure have questions. If you need help throw me a message on AIM @ YRM (yes 3 chars)

Figures the first campaign I try to setup has prepop data I can’t pass to my offer anymore and strangely (then again everything seems strange when it comes to marketing) without doing the prepop data my campaign bombs. After being frustrated for a little bit it came to me, it’s pretty easy to build a pre-pop workaround in PHP. Here’s how I did it, it’s a little ugly so bare with me. In my example I needed to prepop a zipcode. I would try to write steps out for non-php coders to set this up but frankly I don’t even know where to begin.

I have 2 files that look like so in the same directory as my landing page:

File: begin.php

1
2
3
4
5
6
7
8
9
10
11
12
13
<?php
 
setcookie("zipcode", $_GET['zip']);
 
if (isset($_COOKIE['tracking202outbound'])) {
$tracking202outbound = $_COOKIE['tracking202outbound'];
} else {
$tracking202outbound = 'http://yoururl.com/tracking202/redirect/lp.php?lpip=123';
}
 
header('location: '.$tracking202outbound);
 
?>

File: finish.php

1
2
3
4
5
6
7
8
9
10
<?php
 
if(isset($_COOKIE['zipcode'])){
$zip = $_COOKIE['zipcode'];
}
 
$subid = $_GET['subid'];
 
header("Location: http://myaffilliatelink/click/?s=1&c=1&subid=$subid&zip=$zip");
?>

Now 4 steps…
1) within prosper202, if you go to your campaign link, make the campaign link: “http://www.yoururl.com/finish.php?subid=”.

2) generate the landing page code like you normally would and take the landing page link code and use that to replace “http://yoururl.com/tracking202/redirect/lp.php?lpip=123″ in begin.php.

3) on your landing page, make the submission go to “http://www.yoururl.com/begin.php”.

4) replace “http://myaffilliatelink/click/?s=1&c=1&subid=$subid&zip=$zip” with your affiliate network link url and replace the appropriate variables accordingly so it works for you.

You’re done!

Creating Relevant Campaign Group Names to Increase Your Quality Score

March 21st, 2008

One of things I keep hearing over and over again is having ad group name and keyword correlation in order to increase your quality score. Although it isn’t a large factor in the long run (imo) it can definitely play an important role in setting your initial quality score. With that said, the easiest way I’ve found to find relevant ad group names is to simply find out the word frequency within your keyword list. For example on a job related keyword list, this was partially the results:

[jobs] => 1990
[job] => 863
[in] => 805
[resume] => 633
[salary] => 544
[employment] => 282
[for] => 257
[agencies] => 239
[resumes] => 190
[com] => 187
[monster] => 185
[canada] => 178
[online] => 169
[a] => 169
[uk] => 158
[london] => 154
[technician] => 145
[free] => 137
[of] => 132
[courses] => 125
[search] => 121
[salaries] => 117
[description] => 117

From this I get an instant idea of what ad group names I need to make and I’ll even take this list deeper and pull out the word frequency for only the keywords containing “jobs” to find subkeywords to build even more tight knit ad group names.

Here is a little snippet of php code that will produce the same results with your keyword list (stolen from: http://snipplr.com/view/2225/php-tag-cloud-based-on-word-frequency/ and modified slightly) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?php
 
// Store frequency of words in an array
 
$freqData = array();
 
$keywords = file_get_contents('yourkeywordlist.txt');
 
// Get individual words and build a frequency table
 
foreach( str_word_count( $keywords, 1 ) as $word )
 
{
 
// For each word found in the frequency table, increment its value by one
$word = strtolower($word);
array_key_exists( $word, $freqData ) ? $freqData[ $word ]++ : $freqData[ $word ] = 0;
 
}
 
arsort($freqData);
 
print_r($freqData);
?>