Lets Learn

Opinion Matters

AltChunk – An Interesting Scenario

Posted by Ankush on October 22, 2010


Merging multiple word documents into a single document is a very common requirement.  This  becomes a bit complex when you are trying to do this in a Web application which should not automates Office (http://support.microsoft.com/kb/257757). As an alternative, Microsoft has introduced Open XML SDK which you can use to read/write the data into Office application.  altChunk, which is a special feature of Open XML word processing markup that enables you to embed an entire Open XML document or an html page at a specific location in a document.

So basically there are 3 ways to merge the documents in a non-interactive environment:

1. Use AltChunk
2. Manually merge the document
3. Use Power Tools

By far the first option of using altChunks is the easiest method for merging multiple documents together. Not only can altChunks import other WordprocessingML documents, but it can also import html, xml, rtf, or plain text. Manually merging multiple documents together is feasible, but requires you to handle a number of issues. For example, you will need to manually merge and deal with conflicts related to styles, bullets and numbering, comments, headers and footers, etc.

Now lets consider a scenario and the solution:

Scenario:

Consider a scenario where I have merge multiple documents into one. These documents are coming from different sources and after merging, it should be sent to the management. The application should be a web application and no temp file should be generated.

Solution:

My solution contains

1. Test.docx the source document. They can be multiple which are supposed to be merged. In my example I am assuming its one
2. Test1.docx. The merged document which contains the content control [kind of bookmark] so that source documents can be inserted easily. I will create a separate blog as how you work with content control

Here is the complete code. Remarks are added to explain the code and the flow.

// Function which Opens the Document and Modify it

private static void OpenAndModifyDocument()
{

//  Open the document using Stream and bytes
byte[] byteArray = File.ReadAllBytes(“Test.docx”);

using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
// Modify the document. For ex:: I am inserting a Paragrapgh
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(mem, true))
{
wordDoc.MainDocumentPart.Document.Body.InsertAt(
new Paragraph(
new Run(
new Text(“Newly inserted paragraph.”))), 0);
}
SaveDocument(mem);

}
}
// Function to do Merging with the Stream

Private static void SaveDocument(MemoryStream ms)
{
// Test1.docx is the templates which contains different placeholders (basically a content control) where I am going to the different document streams
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(“Test1.docx”, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
// Make sure to have a unique AltChunk Id
string altChunkId = “AltChunkId” + 30;
// Create an AltChunk element
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
// Here we can also use FeedData method to pass the document stream into AltChunk but that requires a physical file . Hence I am using Streams
Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write);
StreamWriter stringStream = new StreamWriter(chunkStream);
ms.WriteTo(chunkStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
// Find the Content Controls. I am assuming that there is only one content control but if have multiples then you just modify this query
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>().First();
OpenXmlElement parent = sdt.Parent;
// Insert the AltChunk element and remove the content control
parent.InsertAfter(altChunk, sdt);
sdt.Remove();
mainPart.Document.Save();

}

}

Please note that until a document that contains altChunk elements is opened and saved in Office, it still contains the altChunk parts, and not normal WordprocessingML markup of paragraphs, runs, and text elements.  The solution with SharePoint 2010 is that you can use Word Automation Services to update the documents that contain altChunk elements.  After Word Automation Services processes it, the document will contain paragraphs, runs, and text elements.

Let me know about your thoughts!!!

Advertisements

14 Responses to “AltChunk – An Interesting Scenario”

  1. Surya said

    Hi,

    I am trying to merge word documents in sharepoint document library. Some pages in the docs are in portrait and some in landscape. after merging documents all the pages in the documents r displayed in portrait mode. how can i retain page orientation programmatically ?

    fyi…i think we can do it by inserting section properties after each page or each document.

    here is my code

    Appreciate your help..

    foreach (SPFile item in listitem.Folder.Files)

    {

    // SPFile inputFile = item.File;

    SPFile inputFile = item;

    string altChunkId = “AltChunkId” + id;

    id++;

    byte[] byteArray = inputFile.OpenBinary();

    AlternativeFormatImportPart chunk = outputDoc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML,

    altChunkId);

    using (MemoryStream mem = new MemoryStream())

    {

    mem.Write(byteArray, 0, (int)byteArray.Length);

    mem.Seek(0, SeekOrigin.Begin);

    chunk.FeedData(mem);

    }

    AltChunk altChunk = new AltChunk();

    altChunk.Id = altChunkId;

    outputDoc.MainDocumentPart.Document.Body.InsertAfter(altChunk,

    outputDoc.MainDocumentPart.Document.Body.Elements().Last());

    outputDoc.MainDocumentPart.Document.Save();

    }

    outputDoc.Close();

    memOut.Seek(0, SeekOrigin.Begin);

    ClientContext clientContext = new ClientContext(SPContext.Current.Site.Url);

    ClientOM.File.SaveBinaryDirect(clientContext, outputPath, memOut, true);

    // Conversion

  2. Martin said

    Hi Ankush
    Thanks for this sample. If you wanna insert another document at the end of a document or in a paragraph, it works perfect. But have you ever tried to insert another document in a bookmark? Word 2010 can repair the file afterwards but the user has to grant it (and that’s why it’s no option for me..)

    My code:

    XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
    XNamespace r = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

    using (var originalDocument = WordprocessingDocument.Open(this.originalDocumentFileName, true))
    {
    var originalMainPart = originalDocument.MainDocumentPart;

    var chunk = originalMainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);

    using (var fileStream = File.Open(this.dataDocumentFileName, FileMode.Open))
    {
    chunk.FeedData(fileStream);
    }

    var altChunk = new XElement(w + "altChunk", new XAttribute(r + "id", originalMainPart.GetIdOfPart(chunk)));

    var mainDocument = GetXDocument(originalDocument);

    // this works
    mainDocument.Root.Element(w + "body").Elements(w + "p").ToList()[2].Add(altChunk); // can also be first or last or whatever

    var paragraphElements = mainDocument.Root.Element(w + "body").Elements(w + "p");

    foreach (var paragraphElement in paragraphElements)
    {
    var bookmarkStarts = paragraphElement.Elements(w + "bookmarkStart");
    if (bookmarkStarts.Count() > 0)
    {
    foreach (var bookmarkStart in bookmarkStarts.Where(startBookmark => startBookmark.Attribute(w + "name").Value == BookmarkName))
    {
    // does not work properly
    bookmarkStart.AddAfterSelf(altChunk);
    }
    }
    }

    SaveDocument(originalDocument, mainDocument);
    }

    Any suggestions?

    Thanks in advance

    Martin

    • Ankush said

      Hi Martin,

      Thanks for posting the comment.

      I just tried the below code and it works for me…please try this and see if this helps

      static void Main(string[] args)
      {
      string docxSourceFile = @”C:\Users\abhatia\Desktop\doc\Source.docx”;
      string docxOutputFile = @”C:\Users\abhatia\Desktop\doc\des.docx”; ;
      string docxOutputFile1 = @”C:\Users\abhatia\Desktop\doc\main.docx”; ;
      string sBmkId;
      int iChunkId = 1;

      File.Copy(docxOutputFile, docxOutputFile1, true);

      using (WordprocessingDocument myDoc = WordprocessingDocument.Open(docxOutputFile1, true))
      {
      MainDocumentPart mainPart = myDoc.MainDocumentPart;

      IDictionary bookMarkMap = new Dictionary();
      foreach (BookmarkStart bookMarkStart in mainPart.RootElement.Descendants())
      {
      bookMarkMap[bookMarkStart.Name] = bookMarkStart;
      }
      foreach (BookmarkStart bookMarkStart in bookMarkMap.Values)
      {
      if (bookMarkStart.Name == “test”)
      {
      //do insert here
      var parent = bookMarkStart.Parent;
      //create paragraph to insert

      string sChunkId = “AltChunkId” + iChunkId.ToString();
      iChunkId++;

      AlternativeFormatImportPart chunk = null;
      chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, sChunkId);
      AltChunk altChunk=null;
      using (FileStream fileStream = File.Open(docxSourceFile, FileMode.Open))
      {
      chunk.FeedData(fileStream);
      altChunk = new AltChunk();
      altChunk.Id = sChunkId;
      // parent.InsertAfterSelf(altChunk);
      }

      parent.InsertAfterSelf(altChunk);
      }
      }

      mainPart.Document.Save();
      myDoc.Close();

      }
      }

      • Martin said

        Hi
        Thanks for another sample. This adds the document after the bookmark and that works. Have you tried to insert the altChunk into the bookmark? In Word documents you can have bookmarks with content.. or bookmarks in bookmarks.

        My current try is to loop through all Body.ChildElements of the data Document and insert them into the bookmark. But then I have to copy all the styles and images etc manually…

        Don’t know if it’s worth to do it manually…

        Cheers, Martin

      • Ankush said

        Hi,

        No I haven’t tried it but I would like to know why you want to insert a doc within the bookmark? Could you please elaborate the requirements?

        Thanks
        Ankush

      • Martin said

        Hi,

        I want to merge some files to specific bookmarks. The user can choose if he wants the bookmark replaced by the content, added content after/before/between the bookmark.
        It’s a requirenment we need.

        Cheers, Martin

  3. Ankush said

    Thanks Martin.

    So we have the solution if yu want to replace the bookmakr or add before/after the bookmark. I haven’t tried adding a document in between of a bookmark. I will try this and will let you know. My first guess is, it may need some structure around it (may be a paragrapgh or something..)..but i need to try it

    Ankush

  4. greenhorn said

    Hi Ankush,
    Is there a method in Open XML SDK using which I can print the data in the word doc using a printer.

    Thanks in advance
    GreenHorn

    • Ankush said

      Hi GreenHorn,

      Open xml just work on the underlying data ..it knows how to work with the data not with the document. Having said that you can configure a vritual printer and use win32 api to create XPS/PDF/sent this to printer

      Thanks,
      Ankush.

  5. Hi Ankush,

    We are using openofficexml for generating word reports. We need rich html in our word reports. We saw your article and we need altchunk code to place html content in perticular area(like table, control etc.)

    We are posting our code to understand our requirement.

    using (MemoryStream mem = new MemoryStream())
    {
    mem.Write(byteArray, 0, (int)byteArray.Length);
    using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true))
    {
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    foreach (Word.Table RiskTable in mainPart.Document.Descendants().ToList())
    {
    foreach (Word.TableRow trone in RiskTable.Descendants().ToList())
    {
    foreach (Word.TableCell tdone in trone.Descendants().ToList())
    {
    String sone = String.Empty;
    Text tone = new Text();
    Paragraph pone = tdone.Elements().First();
    //Run rone = pone.Elements().First();
    try
    {
    //Paragraph pone = tdone.Elements().First();
    Run rone = pone.Elements().First();
    tone = rone.Elements().First();
    sone = tone.Text;
    }
    catch (Exception)
    {
    }

    for (int i = 0; i < dtModCommon.Columns.Count; i++)
    {
    if (sone.ToLower() == dtModCommon.Columns[i].ColumnName.ToLower())
    {
    tone.Text=Convert.ToString(dtModCommon.Rows[0][sone]);
    break;
    }
    }
    }
    }
    }
    try
    {
    String FileName = "myfile.docx";
    Response.ClearContent();
    Response.ClearHeaders();
    Response.AddHeader("content-disposition", "attachment; filename=" + FileName + "");
    Response.ContentEncoding = System.Text.Encoding.UTF8;
    Response.AddHeader("Content-Length", mem.Length.ToString());
    Response.ContentType = "application/octet-stream";
    Response.OutputStream.Write(mem.ToArray(), 0, (int)mem.Length);
    Response.Flush();
    Response.Close();
    }
    catch (Exception ex)
    {
    }
    }
    }

    ///////////////////////////////////

    We need to replace text with html. How we can do that please help us as we are new in openofficexml.

  6. Andy Burns said

    Thank you sir, that’s brilliant! I’d become totally stuck trying to use FeedData() – but I need to do this all in memory, and it wasn’t obvious to me that that should fail. Using the chunk’s stream works perfectly though!

  7. Rakesh said

    Hi Ankush,

    I want to replace the text with the HTML formatted text.
    Could you please help me with it and provide the source code for it.

  8. Hi,

    Is it necessary to delete the sdt after deleting the altChunk. As per my requirement, I’m inserting an altChunk after an sdt in the same way that you did. But I want to keep the sdt after the insertion. The altChunk is created using the contents of a file. Till this, it all works fine.
    But when I try editing the same docx file by deleting the old chunk with a given id and inserting a chunk of the same id back in the document, the contents are getting repeated. .i.e. I guess the chunk is not getting deleted. So when I edit the document first time, the altChunk appears twice, second time it appears thrice, and so on.

    Can you please attach a code snippet for how to delete an altChunk witha given id successfully.

    Thanks in advance,
    Neha

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: