Welcome, Guest!

Here are some links you may find helpful

Backing up your AssemblerGames PMs/Conversations before the site goes down

FamilyGuy

2049 Donator
Original poster
Donator
Registered
May 31, 2019
344
337
63
AGName
-=FamilyGuy=-
AG Join Date
March 3, 2007
If, like myself, you've been pretty active in private conversations on AG, you might have tens of pages of conversations, some of them with tens of pages of comments in them. That's potentially many thousands of replies. Some very important information and files might be in that mess, and you might feel anxious about loosing them forever. Xenforo doesn't allow to backup your PMs, probably so someone can sell an add-on that does it, but fear not, everything that you can access can be backed up.

Instead of copying thousands of posts by hand, and still miss the deadline, here's a mostly automated procedure. It could easily be adapted to backup your own favorite threads/PMs too if you want a personal backup of a few things.

********

You'll need to use the WGET, FIND, and SED programs. They should be easy to obtain on Linux and MacOS. Windows users might have to look into cygwin, Windows subsystem for Linux, or alternatives. Just google it.™

  1. Install Firefox and the export cookies addon: https://addons.mozilla.org/firefox/addon/export-cookies-txt/
  2. Log in into AG, making sure to tick "stay logged in". Use the addon to export the cookies for AG to a text file, mine is called "cookies-assemblergames-com.txt".
  3. Create a new folder, place the cookies file inside and open a console/terminal in the same folder.
  4. Run the following command:
    wget -mkEpnp --execute robots=off --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/
  5. WAIT!
  6. Rename the newly created "assemblergames.com" folder to A.
  7. Run the following command:
    wget -mkEp --execute robots=off -I/attachments/ -I/data/ -I/conversations --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/
  8. WAIT SOME MORE!
  9. Rename the folder "assemblergames.com" to B.
  10. Create a third folder called Final, copy the content of B to it.
  11. Copy the content of A over Final, overwriting/merging everything that was already there from B.
  12. Change the _bH variable to ./ (current dir) in every html files. In linux and probably OSX:
    find ./Final -type f -exec sed -i -e 's/_bH = "https:\/\/assemblergames\.com\/";/".\/"/g' {} \;
  13. Fix links of attachments:
    find ./Final/conversations/ -type f -exec sed -i -e 's/"https:\/\/assemblergames\.com\/attachments\//"\.\.\/\.\.\/attachments\//g' {} \;
  14. Profits $$$
  15. Like and subscribe!
You should now have an offline backup of your conversations/PMs in the Final folder. The main html file to open with your browser is Final/conversations/index.html

When you click on an attachment, it'll open a basic file browsing page with an index.html file, that file is actually your attachment; right-click, save as, choose a proper filename/extension.

Good luck!

Here's the script I tested on Linux, worked fine for myself. It took a few hours to scrape everything and my backup ended up being around 600 MB, 60 MB compressed.

Code:
#!/bin/bash

wget -mkEpnp --execute robots=off --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/

mv "assemblergames.com" A

wget -mkEp --execute robots=off -I/attachments/ -I/data/ -I/conversations --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/

mv "assemblergames.com" B

mkdir Final

cp -rf B/* Final/

cp -rf A/* Final/

find ./Final -type f -exec sed -i -e 's/_bH = "https:\/\/assemblergames\.com\/";/".\/"/g' {} \;

find ./Final/conversations/ -type f -exec sed -i -e 's/"https:\/\/assemblergames\.com\/attachments\//"\.\.\/\.\.\/attachments\//g' {} \;
 
Last edited:

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
Thanks for posting this. I'll also be contributing a process in the next day or two that uses a custom build of httrack for Windows, plus a quick fixer tool afterwards to do some cleanup and repairs. I've used that process to backup my own PMs successfully, and am currently in the process of finishing a full backup of assemblergames (with 0th bit) and all external image links, which I believe will finish sometime today. That'll provide an easy option for Windows users.
 

FamilyGuy

2049 Donator
Original poster
Donator
Registered
May 31, 2019
344
337
63
AGName
-=FamilyGuy=-
AG Join Date
March 3, 2007
plus a quick fixer tool afterwards to do some cleanup and repairs
I'm all ears if you want to detail those fixes. I don't know much about html and my procedure here is very crude. I also have very little free time until D-day, so I'd take pointers for sure!
 

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
I'm all ears if you want to detail those fixes. I don't know much about html and my procedure here is very crude. I also have very little free time until D-day, so I'd take pointers for sure!
Well I basically just take httrack, which is possibly some of the worst code ever written, and patch it to make it not totally suck by:
-Fixing some catastrophic performance issues that make it basically run forever normally
-Make it actually support pages with unicode characters in the URL without running off the rails
-Add the concept of priorities so I can get it to (for example) scan all the forum index pages up front first to completion
-Fix and address a few dozen other things that can cause it to spear off track and mirror half the internet
-Other changes I did last year when mirroring AG that I've forgotten about

Once that's done, I can mirror the site in a fairly clean fashion, and httrack takes care of most of the hard work in mirroring the content and fixing links. I still need to monitor it closely as the scanning rules are vague and not fine-grained enough, so it can still decide to mirror an entire remote site if a .jpg ends up being a 404 html page with a link back to the root for example, but I can deal with that as it happens. Afterwards, I fix attachments by renaming them to actually be the original file types rather than binary content in html pages and fix the links, and disable the javascript code on AG pages that breaks all the links in a local mirror. After that, it's basically done, and the result is a browsable snapshot with all local content from the site in question and a select list of remote content included (primarily images).
 

FamilyGuy

2049 Donator
Original poster
Donator
Registered
May 31, 2019
344
337
63
AGName
-=FamilyGuy=-
AG Join Date
March 3, 2007
Afterwards, I fix attachments by renaming them to actually be the original file types rather than binary content in html pages and fix the links, and disable the javascript code on AG pages that breaks all the links in a local mirror. After that, it's basically done, and the result is a browsable snapshot with all local content from the site in question and a select list of remote content included (primarily images).
That's the part I'm interested in. If you have code, I'm interested.

Why don't you use wget? Surely there's a Windows version somewhere?
 

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
I wrote a tool in C# to do most of the heavy lifting. It's throwaway code, but here you go:
Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace UpdateAssemblerGamesLinks
{
    class Program
    {
        static void Main(string[] args)
        {
            string rootPath = args[0];

            string assemblerGamesPath = Path.Combine(rootPath, "assemblergames.com");
            string attachmentPath = Path.Combine(assemblerGamesPath, "attachments");

            Console.WriteLine("Renaming attachments");
            string[] attachmentFiles = Directory.GetFiles(attachmentPath, "index.html", SearchOption.AllDirectories);
            Console.WriteLine("Found {0} files", attachmentFiles.Length);

            int filesPerIncrement = attachmentFiles.Length / 100;
            int fileNoPerStep = 0;
            int currentFileNo = 0;
            foreach (string filePath in attachmentFiles)
            {
                if (fileNoPerStep == filesPerIncrement)
                {
                    Console.WriteLine("{0}%", currentFileNo / (filesPerIncrement > 0 ? filesPerIncrement : 1));
                    fileNoPerStep = 0;
                }
                ++currentFileNo;
                ++fileNoPerStep;

                string attachmentDirectoryPath = Path.GetDirectoryName(filePath);
                string attachmentDirectoryName = Path.GetFileName(attachmentDirectoryPath);
                string attachmentNewFileName = Path.GetFileNameWithoutExtension(attachmentDirectoryPath);
                if (!attachmentDirectoryName.Contains('.'))
                {
                    continue;
                }
                int indexOfSeparator = attachmentNewFileName.LastIndexOf('-');
                if (indexOfSeparator < 0)
                {
                    Console.WriteLine("Warning: Failed to locate extension separator for \"{0}\" in path \"{1}\". Assuming extension only.", attachmentNewFileName, attachmentDirectoryPath);
                    attachmentNewFileName = "." + attachmentNewFileName;
                }
                else
                {
                    StringBuilder stringBuilder = new StringBuilder(attachmentNewFileName);
                    stringBuilder[indexOfSeparator] = '.';
                    attachmentNewFileName = stringBuilder.ToString();
                }
                string attachmentNewFilePath = Path.Combine(attachmentDirectoryPath, attachmentNewFileName);
                attachmentNewFilePath = @"\\?\" + attachmentNewFilePath;

                try
                {
                    File.Move(filePath, attachmentNewFilePath);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.Move for file \"{0}\" to \"{1}\": {2}", filePath, attachmentNewFilePath, ex.ToString());
                    continue;
                }
            }

            Console.WriteLine("Editing html files");
            Regex regexAttachment = new Regex(@"\.\./attachments/(.+)-(.+)\.([0123456789]+)/index.html", RegexOptions.Compiled);
            Regex regexPhotobucket = new Regex(@"\.\./\.\./\.\./(.+)\.photobucket\.com/albums/", RegexOptions.Compiled);
            Regex regexSentinelFix = new Regex(@"data-baseurl=""(.+)page-\{\{sentinel\}\}""", RegexOptions.Compiled);
            Regex regexPollViewResultsDisable = new Regex(@"<input type=""button"" value=""View Results"" class=""button OverlayTrigger JsOnly"" data-href=""(.+)/poll/results"" />", RegexOptions.Compiled);
            string matchStringAddressRebase = @"if (_b && _b.href != _bH) _b.href = _bH;";
            string matchStringAddressRebaseNew = @"<!--if (_b && _b.href != _bH) _b.href = _bH;-->";
            string matchStringHTTrackMarker = @"<!-- Added by HTTrack --><meta http-equiv=""content-type"" content=""text/html;charset=UTF-8"" /><!-- /Added by HTTrack -->";
            string[] files = Directory.GetFiles(assemblerGamesPath, "*.html", SearchOption.AllDirectories);
            Console.WriteLine("Found {0} files", files.Length);

            filesPerIncrement = files.Length / 100;
            fileNoPerStep = 0;
            currentFileNo = 0;
            foreach (string filePath in files)
            {
                if (fileNoPerStep == filesPerIncrement)
                {
                    Console.WriteLine("{0}%", currentFileNo / (filesPerIncrement > 0 ? filesPerIncrement : 1));
                    fileNoPerStep = 0;
                }
                ++currentFileNo;
                ++fileNoPerStep;

                string fileContents;
                try
                {
                    fileContents = File.ReadAllText(filePath, Encoding.UTF8);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.ReadAllText for file \"{0}\": {1}", filePath, ex.ToString());
                    continue;
                }

                int directoryNestingDepth = filePath.Replace(assemblerGamesPath, "").Split(new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar }, StringSplitOptions.RemoveEmptyEntries).Length - 1;
                string replaceStringInsertFavicon = matchStringHTTrackMarker + "\n" + String.Format(@"<link rel=""shortcut icon"" href=""{0}favicon.ico"">", String.Concat(Enumerable.Repeat("../", directoryNestingDepth)));
                string fileContentsNew = fileContents;
                fileContentsNew = regexAttachment.Replace(fileContentsNew, @"../attachments/$1-$2.$3/$1.$2");
                fileContentsNew = regexPhotobucket.Replace(fileContentsNew, @"http://$1.photobucket.com/albums/");
                fileContentsNew = regexSentinelFix.Replace(fileContentsNew, @"data-baseurl=""page-{{sentinel}}.html""");
                fileContentsNew = regexPollViewResultsDisable.Replace(fileContentsNew, @"<!--<input type=""button"" value=""View Results"" class=""button OverlayTrigger JsOnly"" data-href=""$1/poll/results"" />-->");
                fileContentsNew = fileContentsNew.Replace(@"<noscript><a href=""poll/results.html"" class=""button"">View Results</a></noscript>", @"<a href=""poll/results.html"" class=""button"">View Results</a>");
                fileContentsNew = fileContentsNew.Replace(matchStringAddressRebase, matchStringAddressRebaseNew);
                fileContentsNew = fileContentsNew.Replace(matchStringHTTrackMarker, replaceStringInsertFavicon);

                try
                {
                    File.WriteAllText(filePath, fileContentsNew);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.WriteAllText for file \"{0}\": {1}", filePath, ex.ToString());
                    continue;
                }
            }

            Console.WriteLine("Complete");
            Console.ReadLine();
        }
    }
}

There is some manual work apart from this (mostly on the main index page) but not much. As for why I use httrack, well it's the devil I know. Been using it for over a decade, and know how to make it do what I want it to do. As I said, it's horrible code, but it's also been battle tested plenty, and I've fixed most of the bugs/issues that caused me grief when using it in anger. I can't say how my version compares to wget, as I've never really used it.
 
  • Like
Reactions: FamilyGuy and pool7

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
Oh, and here's the args I used to do full mirrors of assemblergames:
Code:
"https://assemblergames.com" -%25N0 --cache=0 -O "D:\Emulation\Websites\assemblergamesFinal11" -c32 -#L0 --disable-security-limits --max-rate=0 -%25c0 --depth=2000000000 --robots=0 --keep-alive --near --retries=0 --display --quiet -F "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36" -ad.doubleclick.net/* -mime:application/foobar -*.zip -*.tar -*.tgz -*.gz -*.rar -*.z -*.exe -*.7z -*.ace -*.RAR -*.bz2 -*.lzh -*.sit -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.divx -*.mp4 -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv -*.ogg -*.flac -*.cue -*.pdf -*.bin -https://assemblergames.com/account/* -https://assemblergames.com/find-new/* -https://assemblergames.com/forums/-/* -https://assemblergames.com/login/* -https://assemblergames.com/logout/* -https://assemblergames.com/online/* -https://assemblergames.com/watched/* -https://assemblergames.com/posts/* -https://assemblergames.com/search/* -https://assemblergames.com/threads/*/reply?* -https://assemblergames.com/threads/*/#post-* -https://assemblergames.com/threads/*/add-reply -https://assemblergames.com/watched/* -https://assemblergames.com/recent-activity/* -https://assemblergames.com/lost-password/* -https://assemblergames.com/misc/location-info* -https://assemblergames.com/misc/quick-navigation-menu?* -https://assemblergames.com/misc/style?* -https://assemblergames.com/threads/*/#navigation -https://assemblergames.com/members/*/followers -https://assemblergames.com/members/*/following -https://assemblergames.com/members/*/trophies -https://assemblergames.com/members/*/#* -https://assemblergames.com/members/*/recent-activity -https://assemblergames.com/members/*/recent-content -https://assemblergames.com/forums/*/?* -https://assemblergames.com/forums/*/watch -https://assemblergames.com/threads/*/#* -https://assemblergames.com/goto/* +https://assemblergames.com/attachments/* -https://assemblergames.com/threads/*/watch-confirm -https://assemblergames.com/posts/*/like -https://assemblergames.com/threads/*/reply?* -https://assemblergames.com/posts/*/report -https://assemblergames.com/conversations/*/report -https://assemblergames.com/conversations/*/reply?* -https://assemblergames.com/conversations/*/message?* -https://assemblergames.com/conversations/*/leave -https://assemblergames.com/conversations/*/toggle-starred -https://assemblergames.com/conversations/*/toggle-read -https://assemblergames.com/conversations/add -https://assemblergames.com/conversations/*/invite -https://assemblergames.com/conversations/*/edit -https://assemblergames.com/conversations/*/delete -https://assemblergames.com/members/*/report -https://assemblergames.com/members/*/ignore -https://assemblergames.com/members/*/follow?* -https://assemblergames.com/account/* -https://assemblergames.com/forums/*/create-thread -https://assemblergames.com/threads/*/tags -https://twitter.com/intent/tweet?* -https://assemblergames.com/profile-posts/*/like -https://assemblergames.com/profile-posts/*/comment -https://assemblergames.com/profile-posts/*/report -https://assemblergames.com/profile-posts/*/delete -https://assemblergames.com/profile-posts/* -https://assemblergames.com/attachments/do-upload.json?* -https://assemblergames.com/attachments/upload?* -https://assemblergames.com/*/mark-read?* -*.thingiverse.com/* -abload.de/* -sparbote.de/* -channelf.se/* -github.com/* -*.excite.co.jp/* -*.freeforums.net/* -www.dropbox.com/* -cozumel.ucoz.es/* -www.sega-16.com/* -*.fbcdn.net/* -geekologie.com/* -www.emutalk.net/* -exs.cx/*
There's more going on than this internally, as I hard-coded URL priorities to do multi-pass scanning skipping less important files until the most important ones are done (IE, first css, then forums, then threads, then data, then attachments, then members, etc). I've fiddled with limits internally to ensure it doesn't hammer the servers while still being as aggressive as possible. Since I'm scanning content off-site, I also have to monitor it closely as it scans, particularly in the last pass when it's ripped the local content and is focusing on external content. If it starts spiralling off into the ether, I blacklist the bad domain mid-scan to ignore and scraped links, then drop it from the final file content.
 
  • Like
Reactions: pool7

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
Here's my instructions for backing up your PMs on Windows:

1. Download and extract the following archive: http://nemesis.exodusemulator.com/AssemblerGames/AssemblerGamesBackupPMs.zip
2. Login to the forum at assemblergames.com
3. Obtain your session token. You can do this using the Firefox addon mentioned by FamilyGuy. An easier way if you're using Google Chrome is to hit "F12" to open the debug console, go to the "Network" tab, refresh assemblergames.com, then select it from the top of the list. Scroll down on the panel on the right to the "cookie:" section and take the value for the "xf_session" entry. See the image below for a visual guide:
AssemblerGamesSessionToken.png

Note, be careful with your session token, it's almost as good as your password for getting access to your account (no, the token value shown in that image isn't still valid).
4. Open the "Output\cookies.txt" file in the extracted contents of the zip file you downloaded, and replace the "00000.." value with the value you got from your cookies in the step above.
5. Run the "Backup.cmd" script in the root of the downloaded file. This should mirror the PM content and related content (such as attachments, images, etc) into the "Output" directory. Once the mirroring is done, a cleanup process will run to fix some links and other issues.

And that's it. That'll give you an offline-browseable version of your PMs. Enjoy.
 
Last edited:

Tongara

Donator
Donator
Registered
Jun 1, 2019
121
98
28
AGName
Tongara
AG Join Date
Sep 23, 2009
@Nemesis I keep getting the following error in Windows 7 64:

"The procedure entry Point CreateFile2 could not be located in the dynamic link library KERNEL32.dll".

Any help would be much appreciated~
 

MockyLock

Member
Registered
May 31, 2019
14
0
1
AGName
MoockyLoock
AG Join Date
Apr 22, 2015
Thank you for the tip.
I'll give try this late.
 

pool7

Donator
Donator
Registered
Sep 1, 2018
87
62
18
AGName
pool7
AG Join Date
2008/03/04
@Nemesis I keep getting the following error in Windows 7 64:

"The procedure entry Point CreateFile2 could not be located in the dynamic link library KERNEL32.dll".

Any help would be much appreciated~
Had the same issue; here's the workaround that *seems* to be working:
Download from:
the file labeled: httrack_x64-noinst-3.49.2.zip
Extract it to the same directory where you extracted AssemblerGamesBackupPMs.zip, replacing any files as needed.
Run the Backup.cmd script.
 

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
I can do a Win7 compatible build in a day or so, I'm travelling right now. In the interim, the regular httrack build referenced above should work reasonably well for such a small limited mirror operation, as long as you don't use unicode characters in any of your conversation titles.
 

Traace

Member
Jun 5, 2019
6
2
3
AGName
Traace
AG Join Date
Nov 22, 2016
Thanks. It worked well with cygwin.

50mb unpacked, 1,5mb packed.
 

FamilyGuy

2049 Donator
Original poster
Donator
Registered
May 31, 2019
344
337
63
AGName
-=FamilyGuy=-
AG Join Date
March 3, 2007
Thanks. It worked well with cygwin.

50mb unpacked, 1,5mb packed.
You're welcome and I'm glad I could help.

For the sake of safety, I'd suggest also backing up on Windows using @Nemesis 's script, in case either method misses something. Better safe than sorry with backups!
 
  • Like
Reactions: Traace

SONIC3D

Donator
Donator
Registered
Jun 5, 2019
34
43
18
AGName
SONIC3D
AG Join Date
Oct 31, 2008
Thanks a lot.
httrack tool works perfectly on win10.:D
 

Greg2600

Well-known member
Community Contributor
Registered
Jun 3, 2019
184
192
43
AGName
Greg2600
AG Join Date
Jun 23, 2010
Had the same issue; here's the workaround that *seems* to be working:
Download from:
the file labeled: httrack_x64-noinst-3.49.2.zip
Extract it to the same directory where you extracted AssemblerGamesBackupPMs.zip, replacing any files as needed.
Run the Backup.cmd script.


I tried all that, and get this on Windows 7

Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private

14:55:47 Warning: * security warning: !!! BYPASSING SECURITY LIMITS - MONITOR THIS SESSION WITH EXTREME CARE !!!
14:55:48 Warning: file not stored in cache due to bogus state (broken size, expected 6133 got 475): https://assemblergames.com/conversations
14:55:48 Error: "Forbidden" (403) at link https://assemblergames.com/conversations (from primary/primary)
14:55:48 Warning: No data seems to have been transferred during this session! : restoring previous one!
 
Last edited:

Nemesis

Well-known member
Registered
May 30, 2019
58
125
33
AGName
Nemesis
AG Join Date
Mar 22, 2007
@Greg2600 That's what happens if you haven't set your session cookie correctly. Check the instructions again and make sure you modify the "cookies.txt" file as listed. I'd keep your browser window open with your account logged in when you run the backup.
 

Greg2600

Well-known member
Community Contributor
Registered
Jun 3, 2019
184
192
43
AGName
Greg2600
AG Join Date
Jun 23, 2010
I must have fouled something up when I pasted the session ID in. Did it all over again and it worked. Fabulous tool.
 
  • Like
Reactions: FamilyGuy

Make a donation