mentions a sample XML file being attached, but it seems the attachment
has been lost. Does anyone still have a copy of a suitable XML file for
testing purposes? If it came from WordPress, all the better :-)
To that blueprint. I also attached what seemed to be a Wordpress sample
export.
All the best,
Sava
On Tuesday, February 08, 2011 13:53:52 Daniel James wrote:
> Hi devs,
>
> I have been asked to document XML import from WordPress into Newscoop.
> The wiki page:
>
> http://wiki.sourcefabric.org/display/CS/Import+XML+from+Adob e+InDesign
>
> mentions a sample XML file being attached, but it seems the attachment
> has been lost. Does anyone still have a copy of a suitable XML file for
> testing purposes? If it came from WordPress, all the better :-)
>
> Thanks!
>
> Daniel
>
>
> To participate in the discussion, go here:
> http://forum.sourcefabric.org/index.php?t=rview&frm_id=1 1
Thanks for those. From there, I managed to figure out the tags to use
with a little trial and error. Interestingly, Newscoop complains if you
use upper case in tag names, but the error message shows the offending
tag in upper case, which is a little confusing.
For example I ran into <Lead_and_SMS> import errors, which went away
when I changed the tag in the XML file to <lead_and_sms>.
> To that blueprint. I also attached what seemed to be a Wordpress sample
> export.
For that to work, we could create a standard Article Type in Newscoop
called something like wordpress_import, with matching fields. The
publication editor would then have the choice of retaining all those
fields as they are, or merging them into an existing Article Type.
I noticed that in Newscoop, image links aren't explicitly part of an
Article Type. However, I was able to import the following tags:
<name>Gigantoraptor Discovered in Mongolia</name>
<keywords>Gigantoraptor, Dinosaur, Mongolia</keywords>
<author>Sarah Staffwriter</author>
even though they aren't part of the Article Type news_article that I was
importing into. So maybe we can figure out which tag names to use for
image import too.
There's a further complexity in that as images and other linked files
aren't part of the XML itself, we have to decide how those files are
imported into the database. Leaving the files on the original WordPress
site (or whatever) is likely to lead to broken links later, so we need
some way of sucking in all that content automatically.
I suppose there could be another radio button in the Import XML dialog,
something like:
Import linked content? Yes No
defaulting to Yes, which would trigger curl or wget to follow all the
web links in the XML, and drop the files into the Media Archive.
In the case of an InDesign XML export, the images aren't likely to have
web links, so we would need to have a 'bundle' format, such as a .zip
containing the exported XML file and the images etc.
It's the same issue as for Airtime when importing a playlist from
another system - we can't trust that any linked media on another system
will be there when we need it.