I've been working on understanding the unique language of
Craiglist man for man personal ads.
Here's one measure: a list of the words that show up frequently in ads
but relatively
seldomly in ordinary English, in particular
television and movie scripts.
pic host cock stats suck masculine asian discreet horny oral smooth reply neg hairy im bi cum thick email hiv latin hosting std hwp seeking vers ddf gwm athletic sucking uncut lbs latino gl muscular jock porn cocksucker slim poppers nsa ur bj pnp horned bb discrete seeks etc rim versatile dd dont stocky beefy shaved cocks submissive swap emails sensual loads bod toned anon anal dom cuddle bs chubby dominant blowjob goatee shaven hookup font husky replies cant serviced stroking asians whats completion slender ft cuddling buzzed stds mod filipino moderately kink muscled oakland servicing nips unzip uc orally boyish cl yr br scruffy endowedThe word pic (or pics) shows up once every 120 words in a Craiglist ad, but only one in a million in TV/movie scripts. Cock is about 800 times more likely to occur in a Craiglist ad than in a script. The non-sexually-explicit terms like moderately and masculine interest me most. For completeness, here are the top words that are common in TV scripts but not in Cragislist m4m ads. A lot of proper names and feminine terms. she's uh mother um hmm whoa mrs daughter killed wedding sonny upset theresa ooh she'll luis grace they'll sweetheart julian antonio ms charity billy ray miguel kay evidence sweetie shawn mommy mama barbara elizabeth congratulations aw mother's jennifer witch skye father's gosh ian maria powers mitch witness eddie hank grandma harmony bloody everybody's
Following on my new
fame for looking at the ages
of Craiglist ads, I did a bunch of crunching and got six months of
collected ad data into a database. Which makes it easy to produce
reports like the following:
There's a 10:1 variance in ad volume based on the hour. The quietest times are 4am with about 15 ads an hour, the busiest times are around 8pm with about 150 ads an hour. Noticeable bump when people get home from work, surprisingly little variation in day of the week. Overall the distribution looks like graphs of pretty much any Internet activity with some bias towards more use in the evenings. What I really want to get at is the content of the ads, classify them by desired partner, desired scene, drugs, desperation, etc. I need to chat with someone who understands text clustering.
I'm fascinated by the Craigslist man for man
personal ads (NSFW). I'm not in the market myself but the discourse
is so efficient that I enjoy reading it in idle moments,
it's a great capsule of gay casual sex culture.
Most of the ads are quite terse: the poster's age, a self-description,
a description of the man they're looking for, preferred sexual acts, and a proposed
location.
For the past seven months I've been archiving the RSS for the Bay Area boards, a collection of 485,000 unique personal ads. Here's the distribution of ages of posters. I'm hoping to do more analysis. I'm particularly curious about the relation of the poster's age and the age of their preferred partner, but the data is a bit fuzzy. |