30 Days of Tech: Day 7 - Mephisto Hacking
June 7th, 2008
f you were paying close attention, you may have thought I missed a day on Thursday with the 30 days of tech. It turns out that you would be both right and wrong. I wrote the post on Wednesday night and asked Mephisto to publish in on Thursday since I generally go straight from work to the airport on Thursdays. Only when I went to write my post for Friday did I realize that the old index page was still cached, so the post never showed up. Today I endeavored to find a solution to that problem. This solution is a total hack, and for all I know the latest version of Mephisto has a standard way to do this.
If you were paying close attention, you may have thought I missed a day on Thursday with the 30 days of tech. It turns out that you would be both right and wrong. I wrote the post on Wednesday night and asked Mephisto to publish in on Thursday since I generally go straight from work to the airport on Thursdays. Only when I went to write my post for Friday did I realize that the old index page was still cached, so the post never showed up. Today I endeavored to find a solution to that problem. This solution is a total hack, and for all I know the latest version of Mephisto has a standard way to do this.
My original plan was to run a script from crontab to clear the site cache. I changed my mind about this quickly when I realized that the stylesheets and images that Mephisto serves up would also be cleared out of the cache by this. Then I started digging into Mephisto’s code, hoping to find a nice way to clear correct cache entries for publishing an article. Since Mephisto clears the caches automatically when you publish, I figured this would be a good answer. The problem I ran into is that much of the caching is tied into controller code, making it difficult to reuse the expiry logic in an external script. I probably could have made this route work, but I wanted something I could finish quickly.
So I decided to simply emulate clicking the publish button from the crontab script. The script finds articles to be published from the last 90 minutes and acts like an admin publishing the article. It uses curl to post the appropriate values to mephisto which in turn invalidates the cache. There were two hurdles I hit as I went down this path, however
- Admins need to be logged in
- The “Publish in these sections” checkboxes got cleared out whenever I ran the script
The first problem was easy, once I started spelunking for the work “cookie” in the curl man page. Here are the entries that I found:
-c/--cookie-jar <file name>
Specify to which file you want curl to write all cookies after a
completed operation. Curl writes all cookies previously read
from a specified file as well as all cookies received from
remote server(s). If no cookies are known, no file will be writ-
ten. The file will be written using the Netscape cookie file
format. If you set the file name to a single dash, "-", the
cookies will be written to stdout.
NOTE If the cookie jar can't be created or written to, the whole
curl operation won't fail or even report an error clearly. Using
-v will get a warning displayed, but that is the only visible
feedback you get about this possibly lethal situation.
If this option is used several times, the last specified file
name will be used.
-b/--cookie <name=data>
(HTTP) Pass the data to the HTTP server as a cookie. It is sup-
posedly the data previously received from the server in a "Set-
Cookie:" line. The data should be in the format "NAME1=VALUE1;
NAME2=VALUE2".
If no '=' letter is used in the line, it is treated as a file-
name to use to read previously stored cookie lines from, which
should be used in this session if they match. Using this method
also activates the "cookie parser" which will make curl record
incoming cookies too, which may be handy if you're using this in
combination with the -L/--location option. The file format of
the file to read cookies from should be plain HTTP headers or
the Netscape/Mozilla cookie file format.
NOTE that the file specified with -b/--cookie is only used as
input. No cookies will be stored in the file. To store cookies,
use the -c/--cookie-jar option or you could even save the HTTP
headers to a file using -D/--dump-header!
If this option is set more than once, the last one will be the
one that's used.
So by specifying -b and -c on the command line you can make curl save the cookies returned by the server to a file and read those cookies to send back on another command. These options allow the script to login to mephisto and maintain the cookie for the later post to update the article and clear the cache.
What about the sections being cleared though? This one took me some time to figure out. I ended up diving through the Mephisto codebase to figure out why the sections were being cleared out even though I wasn’t posting an changes. Here is what I found:
c.before_filter :set_default_section_ids
...
def set_default_section_ids
params[:article] ||= {}
params[:article][:section_ids] ||= []
end
If you don’t post any section ids, it clears them all out implicitly. This makes sense when you think about how checkboxes work (by not posting values when they are not checked), but was quite frustrating because it meant the script must re-post the section ids that it finds on the article to get it posted in the correct sections. Not a huge deal, but definitely inconvenient.
All in all, it was fun to poke around in unfamiliar code and a cook up a crazy solution for my problem. Maybe I’ll have time to come up with something more elegant eventually, but for now the script the follows will be assisting in my posts (including this one).
require File.dirname(__FILE__) + "/../config/environment.rb"
COOKIE_JAR = RAILS_ROOT + "/tmp/publisher_cookies"
LOGGER = Logger.new(RAILS_ROOT + "/log/clear_cache_for_newly_published_articles.log")
def curl(options)
command = "/usr/bin/curl -q -s -b #{COOKIE_JAR} -c #{COOKIE_JAR} " + options.join(" ") + " > /dev/null"
LOGGER.info command
system command
end
LOGGER.info "\n\n============== Running at #{Time.now.utc} UTC"
curl(["-F login=dvollbracht", "-F 'password=[REDACTED]'", "http://davidvollbracht.com/account/login"])
Article.find(:all, :conditions => ['published_at BETWEEN ? and ?', 90.minutes.until(Time.now.utc), Time.now.utc]).each do |article|
LOGGER.info "==== Publishing #{article.title} (article #{article.id})"
options = article.sections.map {|s| "-F 'article[section_ids][]=#{s.id}'"}
options << "http://davidvollbracht.com/admin/articles/update/#{article.id}"
curl(options)
end
1 Response to “30 Days of Tech: Day 7 - Mephisto Hacking”
Sorry, comments are closed for this article.
June 30th, 2008 at 10:49 AM Nice Site! http://google.com