XML import sample files
  • Vote Up0Vote Down Daniel JamesDaniel James
    Posts: 844Member, Sourcefabric Team

    Hi devs,

    I have been asked to document XML import from WordPress into Newscoop.
    The wiki page:

    http://wiki.sourcefabric.org/display/CS/Import+XML+from+Adob e+InDesign

    mentions a sample XML file being attached, but it seems the attachment
    has been lost. Does anyone still have a copy of a suitable XML file for
    testing purposes? If it came from WordPress, all the better :-)

    Thanks!

    Daniel
  • 4 Comments sorted by
  • Vote Up0Vote Down Andrey PodshivalovAndrey Podshivalov
    Posts: 1,526Member, Administrator, Sourcefabric Team
  • Vote Up0Vote Down Sava TatićSava Tatić
    Posts: 113Member, Administrator, Sourcefabric Team

    There is also this:

    http://wiki.sourcefabric.org/display/CS/Importing+Legacy+Arc hives

    I attached the sample file there from
    http://wiki.sourcefabric.org/display/CS/Attachments+%28look+ here+for+missing+files%29

    To that blueprint. I also attached what seemed to be a Wordpress sample
    export.

    All the best,

    Sava

    On Tuesday, February 08, 2011 13:53:52 Daniel James wrote:
    > Hi devs,
    >
    > I have been asked to document XML import from WordPress into Newscoop.
    > The wiki page:
    >
    > http://wiki.sourcefabric.org/display/CS/Import+XML+from+Adob e+InDesign
    >
    > mentions a sample XML file being attached, but it seems the attachment
    > has been lost. Does anyone still have a copy of a suitable XML file for
    > testing purposes? If it came from WordPress, all the better :-)
    >
    > Thanks!
    >
    > Daniel
    >
    >
    > To participate in the discussion, go here:
    > http://forum.sourcefabric.org/index.php?t=rview&frm_id=1 1

    --
    Sava Tatić
    Managing director, Sourcefabric o.p.s.
    sava.tatic@sourcefabric.org

    Salvátorská 10
    110 00 Praha 1, Czech Republic
    +420 2 22 36 25 40
    +1 647 889 2811 (Toronto)
    Skype: tictactatic

    http://www.sourcefabric.org
    http://www.twitter.com/Sourcefabric
  • Vote Up0Vote Down Daniel JamesDaniel James
    Posts: 844Member, Sourcefabric Team

    Hi Andrey,

    > you can find missed attached files there:
    > http://wiki.sourcefabric.org/display/CS/Attachments+%28look+
    > here+for+missing+files%29

    Thanks for those. From there, I managed to figure out the tags to use
    with a little trial and error. Interestingly, Newscoop complains if you
    use upper case in tag names, but the error message shows the offending
    tag in upper case, which is a little confusing.

    For example I ran into <Lead_and_SMS> import errors, which went away
    when I changed the tag in the XML file to <lead_and_sms>.

    Cheers!

    Daniel
  • Vote Up0Vote Down Daniel JamesDaniel James
    Posts: 844Member, Sourcefabric Team

    Hi Sava,

    > There is also this:
    >
    > http://wiki.sourcefabric.org/display/CS/Importing+Legacy+Arc hives

    Thanks, I have attached an improved example from the Newscoop manual to
    that page, which imports without errors into 3.5 GA. It's simpler too.

    http://wiki.sourcefabric.org/download/attachments/491935/con tent2.xml

    > To that blueprint. I also attached what seemed to be a Wordpress sample
    > export.

    For that to work, we could create a standard Article Type in Newscoop
    called something like wordpress_import, with matching fields. The
    publication editor would then have the choice of retaining all those
    fields as they are, or merging them into an existing Article Type.

    I noticed that in Newscoop, image links aren't explicitly part of an
    Article Type. However, I was able to import the following tags:

    <name>Gigantoraptor Discovered in Mongolia</name>
    <keywords>Gigantoraptor, Dinosaur, Mongolia</keywords>
    <author>Sarah Staffwriter</author>

    even though they aren't part of the Article Type news_article that I was
    importing into. So maybe we can figure out which tag names to use for
    image import too.

    There's a further complexity in that as images and other linked files
    aren't part of the XML itself, we have to decide how those files are
    imported into the database. Leaving the files on the original WordPress
    site (or whatever) is likely to lead to broken links later, so we need
    some way of sucking in all that content automatically.

    I suppose there could be another radio button in the Import XML dialog,
    something like:

    Import linked content? Yes No

    defaulting to Yes, which would trigger curl or wget to follow all the
    web links in the XML, and drop the files into the Media Archive.

    In the case of an InDesign XML export, the images aren't likely to have
    web links, so we would need to have a 'bundle' format, such as a .zip
    containing the exported XML file and the images etc.

    It's the same issue as for Airtime when importing a playlist from
    another system - we can't trust that any linked media on another system
    will be there when we need it.

    Cheers!

    Daniel