Using Drupal with External Data Sources

Alright. It’s DrupalCON time.

At each of the last two DrupalCONs, I gave a short talk about methods for using Drupal with external data. Specifically, I focus on using external APIs, secondary databases, and “lazy instantiation” to import large data sets into Drupal. I think it’s a good talk, and an important subject for understanding the power of Drupal as a platform. (In fact, the original title was “XML, Mashups, and Drupal-as-platform” back in Sunnyvale last year.)

This year, with a nod towards the spirit of collaboration across the Drupal community, the talk has mutated into a 90-minute session, covering a wider range of topics, and featuring some people smarter than me in a panel discussion.

Neil Drumm of Advomatic, a longtime Drupal contributor, will be discussing batch import scheduling, command-line import processing, and a host of automation techniques that he’s developed for dealing with legacy data.

Matt Cheney of Chapter Three will be discussing data scraping, parsing, and auditing; plus enhancing data by using job queues to process batch data.

We’re all excited to be presenting, and expect the session to introduce attendees to new ideas for dealing with legacy and external data sets.

For more information, see the official conference page.

DrupalCON: Hockey Night

picture-1.png

Thursday March 6th, 7:00 | Boston Bruins v. Toronto Maple Leafs

Come on out to the game with the folks from Morris DigitalWorks, and the King of Denmark!

I reached out to the Boston Bruins ticket office. They have set aside seats for us to buy (individually) at the following quantities. (And the B’s are fighting for the playoffs, so this should be a great game.)

We have reserved the right to purchase tickets together. The ticket office has reserved the following price tiers for us. You should order by February 29th, but can probably get tickets through Tuesday.

  • 25 seats @ $80 per person
  • 90 seats @ $40 per person
  • 45 seats @ $28 per person

$28 is the lowest price for game seats. I personally bought 5 $40 seats to reserve the tickets.

We are not obligated to buy anything else. You may buy single or multiple tickets.

To attend, call Charlie Karoly* at the Bruins ticket office.

+1 617-624-1808

More information available in the flyer Charlie made for us. You can see the seating map online.

* Yes, chx, he’s Hungarian-American. I asked.

DrupalCON: News Industry Meetup

As previously announced, there will be a News Industry meetup on the first night of DrupalCON. I’m pretty excited, as it is a great chance to meet some folks working on projects similar to ours.

Well, Jeff Anderson of HamptonRoads and PilotOnline reminded me that I never announced any of the details of the meetup. So, without delay, here’s what you need to know.

LOCATION

The News Industry event will be held at Lucky’s Lounge at 7:45 pm, Monday, March 3rd. Working with the folks from Development Seed, we’ve reserved the lounge for the night.*

355 Congress Street
(between A St & Pittsburgh St)
Boston, MA 02210
(617) 357-5825
Google Map


View Larger Map

Looking for people who use Drupal the way that you do? If you’re in the news business, come meet up with industry professionals, thought leaders, and Drupal experts. We’ll have representatives from some of the leading sites on the Drupal platform, including:

* The Knight Foundation
* NowPublic.com
* SavannahNOW.com
* Page6.com
* BlufftonToday.com
* Newspapers on Drupal
* Hampton Roads / Pilot Online
* Vineyard Voice
* The Island Packet (Hilton Head, SC)

And many, many more.

* Note that other industry networking events will occur at Lucky’s before 7:45, so plan to arrive between 7:45 and 8:00.

Special thanks to Bonnie at Development Seed for doing the heavy lifting on the venue!

DrupalCON Boston: Site Building track

Well, it’s now official. We are ready to announce the Site Building track of DrupalCON Boston.

First, a big hat tip to Addison Berry (add1sun) and Victor Kane for helping get all this organized. And a note to the submitters: We reviewed 54 strong proposals for only 16 conference slots. Decisions were made carefully, along the following criteria:

  • Some panelists were invited by the committee to address specific topics. These account for half of our sessions.
  • Sessions featuring panelists discussing broad issues were generally favored over sessions featuring single modules or ideas.
  • We wanted to have some true hands-on sessions to help people solve specific problems.
  • With one exception, presenters could only get one session selected.

So, with that out of the way, here’s the list. We are very excited, and hope you will be as well.

+ Indicates a multi-presenter session.
* Indicates an invited presenter.

We tried very hard to hit the themes we laid out during session planning. Our thinking was that we wanted to cover some topics that affect the day-to-day operations of Drupal-powered web sites. We also wanted to emphasize some of the cooler aspects of Drupal — particularly how it can be used by non-technical people to accomplish amazing things. Our target topics were to help people :

  • Create a foundation for development and deployment of Drupal sites
  • Rapidly create a site with a Drupal specific workflow
  • Build sites without coding
  • Use Drupal as a platform

I think we came pretty close to hitting these goals. We also had to combine several topics into larger sessions. At least three of the panel discussions started life as solo presentations, a tribute to the spirit of collaboration in the Drupal comminity.

Well, lastly, I want to apologize to those of you who are learning that we did not select your panel through this format. We simply did not have the resources to send individual notes to all of the people who proposed sessions.

And if you’re coming to Boston, give Charlie a call for some hockey tickets….

MySite and Drupal 6?

So, just after I answered a question about the future of the MySite module, I got more deeply involved in the discussion here: What are the future plans for integration of Panels and MySite modules?.

To quote myself:

For Drupal 6, it looks like Panels is the next step towards some unified data API tools that we need. Trying to compete with that momentum doesn’t really advance what I’m trying to accomplish.

So, I’m thinking that the best route for MySite in Drupal 6 is as follows:

— Replace the MySite API (the backend data structure) with Panels.
— Work on porting Panels — particularly in making richer data available.
— Work on Panels Profiles (if needed) — perhaps by integrating the MySite Icons module.
— If needed, create a MySite module that fills the gaps that other modules leave behind.
— Create an update process that lets MySite 5.x users seamlessly upgrade to the new version of whatever the tool is.

Now’s the time, MySite users. What do you think we should do for Drupal 6?

Note that discontinuing the module is not the goal. The goal is replacing it with a Panels-based alternative. An upgrade path would be provided.

Open source, journalism and freedom

Attached is a copy of the presentation that I gave to the New York Press Association back in September. The summary is “Using open-source software is a moral imperative for news organizations, because doing so help spread freedom of speech and encourages the open exchange of ideas.”

picture-2.png
A slide showing the number of registered Drupal users from notable countries identified as repressive of free speech by Reporters Without Borders.

The presentation is a PDF. I generally use images and then talk, so this is a little short on words and may be hard to follow. Some notes:

– I gave this three days after DruopalCON Barcelona, so the first slide is about what we did in Barcelona.
– That is a picture of Correfoc, part of Barcelona’s annual festival La Merce.
– I removed a picture of a specific person because I have not asked his permission.

The talk was well-received, and I appreciate the NYPA allowing me to give it.

Get a copy of the presentation. [PDF file, 19.1 MB].

Presstime interview on newspapers and open source

An email-based interview that I did with Mark Toner appears in this month’s issue of Presstime, the magazine of the Newspaper Association of America.

share2.jpg
Photo illustration by Presstime. Photo by Diana Porter

My involvement focuses on Drupal development, of course. But there are also interviews with Adrian Holovaty about Django and Robert Cauthorn of CityTools.

Since this is my blog, here’s the full text of the interview portion. Mark’s questions are set off with —–. [Note: I wrote this between Dec. 3 and 6th, and have made minor corrections, indicated by brackets.]

————
Can you give me a bit of background about Morris DigitalWorks’ work
with Drupal? Was this its first foray into open-source? What were the
rationales for doing so? What’s the broad timeline we’re talking
about?
————

As a web hosting and services company, we have been using open source tools such as Apache since we started in 1995. Open source IT projects are nothing new. Many server-level applications run on open source. It is only in the last 5 or 6 years that the “LAMP” stack — Linux, Apache, MySQL, and PHP — has become so prevelant on the web.

We started looking at open-source content applications back in late 2004. Bob Gilbert, who ran our Strategy division, asked me to look at open-source content management applications. We were specifically looking for a way to get products to market faster.

In 2004, I looked at 10 or 12 open-source applications that would run on Apache, MySQL and PHP — Drupal, Mambo, Geeklog, Absolut, Typo3, Xoops, to name a few. The two standounts were Mambo (since relaunched as Joomla) and Drupal.

I was especially drawn to Drupal because it provides an open, extensibile framework for application development. In some ways, it is more similar to development tools like Symfony, Cake, or Django than it is to a piece of content-management software. (Note: We also use Joomla for instances where community interaction is not a requirement.)

The first Drupal application we launched was in late 2004. The site was an intranet for our advertising sales force. That site is still running today, in fact. It gave us a good low-risk test of the platform and let us install our first LAMP server.

That experience turned out to be lucky, because at the beginning of 2005, we learned of the plan to launch the Bluffton Today newspaper to replace the old Carolina Morning News. We had about six weeks to build the entire site. Drupal allowed us to do two very important things:

– Get a robust community site online very quickly.
– Discard some legacy thinking about what “newspaper web sites” should be.

Ironically, I was working on other projects during most of the build out. Steve Yelvington and Ed Coyle did most of the work without any prior Drupal experience.

After the success of Bluffton Today, we were looking to see if the platform could be used to power a more traditional (and larger) newspaper web site. When the idea came around to redevelop SavannahNow, we pushed hard on the Drupal platform to see if it could support the vision that Heather Nagel-Doughtie and Darryl Kotz had for the site.

SavannahNow has had a mixed result. Drupal 4.6 (the current version is 5.3) was not quite ready for what we wanted to accomplish. We made numerous mistakes in trying to force functionality into the software. In some cases, that functionality has been added in alter releases. In other cases,. we backed ourselves into a tech corner.

————
My understanding (which may be wrong — and *please* correct me if it
is) is that Morris Digital developed its MySite framework for Drupal
and then contributed much of that work to the open-source project. Can
you give me a sense of both the process involved in doing so, and any
discussions that happened around the idea of essentially giving back
to the broader development community?
————

I developed MySite with Morris DigitalWorks blessing. All of the MySite work has been contributed back. The project intended to do two things:

– Give a project back to the Drupal community.
– Push our ability to provide custom content to our site users.

The discussions, really, are about participation. What people sometimes fail to grasp about open-source is that it tends to be a meritocracy. Decisions get made by small groups of trusted individuals. If you participate in the process of decision-making, it is a lot easier if you have some “code equity” in play.

To ge the most out of Drupal — or any similar project, really — you need to be an active participant in the community. MySite was the most obvious example, but we have other MDW employees who help in Drupal support forums, file bug reports, write documentation and submit occasional code patches.

The strength of open-source is that everyone contributes a little bit, and those contributions add up to a greater whole than an individual or small team could hope to develop. So the basic idea is that any contribution that you make will come back to you, exponentially, and sometimes in ways you don’t expect.

For example, Drupal users are obsessive about web standards and about keeping up with the latest developments in web development. Open Social [Open ID] support in Drupal 6 is a good example, as is some Facebook integration work.

I should also note that, as a company interested in journalism, we are also helping to support access to free software that helps people write, publish, and distribute content despite government objections. I did a talk for the New York Press Association where I noted the significant number of Drupal users in countries such as Egypt, Myanmar, and Iraq, where democratic ideas of freedom of the press are not supported. The international support of platforms like Drupal, however, has the effect of spreading access to technology that enables free speech.

————
I’m trying to get a general sense of what it’s like for a newspaper
company to develop code and interact with the open-source community –
the pros and cons. Is the process one traditional applications folks
who work for newspaper companies would be familiar with?
————

I’m not sure that I’m qualified to answer that, since I am not a developer by training. I think news industry people will recognize the sort of democratic, town-hall forums process that leads to decisions being made.

But, in terms of individual project development, there is a framework — a rule set if you like — and you are free to play within those boundaries as you see fit. The only time that development gets contentious is when debates arise over how to accomplish specific tasks within the core Drupal framework.

For example, I just submitted a patch that takes content access controls — the Drupal “node access” system — in a new direction. I think it’s the right direction, but I don’t have final say. That is a community decision. Some developers might find that sort of constraint frustrating.

————
Moving away from the development of open-source projects to the
implementation, what are the pros and cons of a newspaper adopting
Drupal or another open-source CMS? Can you give me examples of how
Morris has worked with its papers on the rollout/refinement of these
products? How does this differ than a traditional vendor installation?
(Beyond the obvious, I mean). How much do the folks at individual
newspapers contribute to the refinement/development of open-source
projects, either in terms of coding or feedback?
————

The obvious pro is that your development team goes from a few to hundreds overnight. Instead of relying on an internal — and isolated — team of developers, you can tap into a network — often a worldwide network — of developers. That means you have developers and small teams working on problems that you haven’t even envisioned yet. And you have far greater resources for horizon watching, keeping up with new technologies, and security fixes.

It may seem contrary to some, but open-source projects have a [much] higher security level than most closed projects. That’s simply a fact of the number of eyes looking at the code and, frankly, the ease of sending attacks at openly published code. More openness means more focus on security holes.

The con side to the argument is a little more complicated. For one, you lose absolute control over the project. You can influence the direction of open-source, but only if you participate as an equal partner. You really can’t come in to an existing community and impose your will — not even by throwing money around, though that can help.

So you need to be patient and realize that you won’t have all of your needs met by the open-source community. Personally, I find that forces us to reconsider the requirements that we draft for a project. Our experience in launching Drupal projects almost always skews towards simplicity. The platform has enough features and flexibility that it meets most use-cases. I generally say that Drupal solves 70% of our problem out-of-the-box. The question that remains is: how important is that last 30%.

As opposed to traditional vendor rollouts, there are two factors that are crucial. First, you have total control over the source code. That means you can make modifications, additions, and enhancements to the system as you see fit. And the barrier to entry — some PHP coding — is fairly low, so you gain some agility and speed-to-market over waiting for a vendor to produce a featiure for you.

But for organizations without PHP developers, it can be a real problem, because there is also no inherent support with open source other than the community.

And for publishers or CTOs who are used to dealing with SLAs and contractual support 24/7, the jump into a self-supporting user community can be a real challenge. There are some companies that you can hire to provide support, and I suspect that we’ll see even more of them arise to meet enterprise-level support needs.

After the rollout — or often during testing — we get valuable feedback from the properties as they are testing the new systems. Sometimes these are workflow issues; some are UI issues. In the best cases, we are able to make the changes and feed them back into the overall Drupal development cycle. When we built SavannahNow on Drupal 4.6, we uncovered a handful of scalability issues that have been addressed quite nicely in the 5.x and 6.x release series.

————
Much of the open-source activity seems to center around CMSes. Are
there other types of open-source software you’re familiar with
newspapers using or experimenting with? If not, are there applications
where you see potential opportunities for open-source projects?
————

Well, you should ask some IT guys — the folks who run the web servers and the pre-press software and the presses. Open source is all over the newspaper industry, CMS is just more public, so it gets more attention.

Honestly, I don’t have time to catalog the list of open-source projects that might be of benefit.

I do see a lot of opportunity in the intranet and business logic space. There are some interesting open-source CRM systems, and I’ve been using Drupal to solve some contract management hassles. I also know of some Drupal projects designed to run server management, which have some interesting potential.

The other big thing that needs to come out of the open-source CMS space is a workable model for web-to-print workflow. The application that nails that will make a giant leap forward.

For those of you who read all the way to the end, come to DrupalCON Boston and the News Industry meetup, March 3rd - 6th. I’ll be happy to talk Drupal, open source, and newspapers with you.