Lets Learn

Opinion Matters

  • Subscribe

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 8 other followers

  • Blog Stats

    • 74,660 hits
  • Ankush

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Basics of Spreadsheet – Part 1

Posted by Ankush on March 4, 2010


I have been working with Open XML closely and thought of to write a blog which provides an Introduction to Open XML format. Office Open XML is a ZIP-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. By default, documents created using 2007 Microsoft Office products is based on new, XML-based file formats. Before getting into the discussion, let’s understand the basic terminology.

Package: The ZIP container that holds the components (parts) that comprise the document, as defined by the Open Packaging Conventions specification.

Part: Corresponds to one file in the package. For example, if a user right-clicks an Excel 2007 file and chooses to extract it, you will see files like workbook.xml file and several sheetn.xml files. Each of those files is a part in the package.

Relationships: The method used to specify how the collection of parts are related. This method specifies the connection between a source part and a target resource. Relationships are stored within XML parts (for example, /_rels/.rels) in the document package.

XML: Extensible Markup Language (XML) is a simple, flexible text format designed for electronic publishing and the exchange of a wide variety of data on the Internet and elsewhere.

ZIP: An industry-standard compression archives format used to store and transport files between computers, through e-mail, or over the Internet.

Now we understand this, lets discuss about SpreadsheetML. Let’s say you have created an Excel workbook using excel 2007 which contains some values in cell. You rename the Excel file to .zip and extract the content of it, this is the structure you will see:

_rels folder: This folder contains a .rels file that defines the root relationships within the package.

.rels file: Contains relationships based on the start part (the virtual start part). Relationships are defined with the following format:

 <Relationship Id=”someID” Type=”relationshipType” Target=”targetPart”/>

where Id is a unique string in the .rels file.

Type: The type of the relationship that distinguishes relationships from one another and provides a hint as to the relationship’s purpose. It points to the schema that defines Office XML Formats types.

Target: Points to the folder and file that contain the target of the relationship (another part).

For example: A sheet part is defined as

 <Relationship Id=”rId3″ Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet” Target=”worksheets/sheet3.xml”/>

A style part is defined as

<Relationship Id=”rId5″ Type=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles” Target=”styles.xml”/>

DocProps folder: This folder contains 2 files, app.xml , core.xml which contains the information about workbook, owner, last modified etc.

Xl folder (Main Document Part): The target of the http://schemas.microsoft.com/office/2006/relationships/officeDocument relationship is the main part defining the document; the workbook part for Excel. All other relationships are based on the main document part.

This folder contains subfolders:

_rels folder: This folder contains a .rels file that defines the relationships at workbook level. Basically it contains the relationship for worksheet,SharedString etc.

.rels file: Contains relationships based on the workbook part level.

theme Folder: This folder defines a file which defines themes, theme1.xml. This file stores all of the elements of a theme that are applicable across Microsoft Office and the Spreadsheet settings. This file contains sections for theme color, font, and effect settings.

worksheets Folder: This folder contains the individual Sheet (part).The XML file for individual sheet contains a table like structure for defining data. Inside a worksheet the data can be split up into three distinct sections. The first section contains sheet properties. The second contains the data, using the required sheetData element. Right after sheetData various supporting features can be found such as sheet protection and filter information. I will explained later how the sheetData is defined.

<worksheet xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/main” >
<sheetData>
<row>
<c>
<v>1234</v>
</c>
</row>
</sheetData>
</worksheet>

sharedStrings.xml: Shared strings optimize space requirements when the spreadsheet contains multiple instances of the same string. More detail is provided in the second part of this blog.

Styles.xml: This file defines the styles defined within the workbook. It can include cellStyles, Font Style etc.

workbook.xml: The first and foremost task of the workbook part is keeping track of the worksheets, global settings and other shared components of the workbook. The workbook part maintains a list of all the worksheets in the spreadsheet. This list is created to be able to name all the worksheets as well as providing sequencing information for the Open XML application. There are three pieces of data stored inside the workbook part about all the worksheets. Each sheet has a name attached for display in the consumer UI. There is an ID value used for sorting the sheets and finally a relationship ID to point the workbook to the part inside the package where the sheet is being stored. Other information in the workbook part concerns the views, calculations and versioning information as well as other options.

<workbook xmlns=http://schemas.openxmlformats.org/spreadsheetml/2006/main xmlns:r=”http:// schemas.openxmlformats.org /officeDocument/2006/relationships”> <sheets>
<sheet name=”Sheet1″ sheetId=”1″ r:id=”rId1″ /> </sheets> </workbook>

[Content_Types].xml: Provides a listing of the content types for the other parts that are contained in the package. Content types are defined as the types of parts that can be stored in a package:

For ex: This is how it is defined for workbook.xml and Shee1.xml

<Override PartName=”/xl/workbook.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml” />
  <Override PartName=”/xl/worksheets/sheet1.xml” ContentType=”application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml” />

Other Information: There are many more parts that you can put into a SpreadsheetML package. For example: insert a chart in the Excel file and then extract the content of it. Go to xl folder and you will see Chart folder, drawing folder has been created. In this way you can add objects using Excel UI and then analyze the xml structure to learn about this.

My next blog is going to talk about how data/formulas are defined within XML structure. To be specific worksheet1.xml.

Let me know what you think about this blog. In case of any questions on Open XML, leave a comment and I’ll respond to it quickly.

Happy reading!!!!!!!

Advertisements

One Response to “Basics of Spreadsheet – Part 1”

  1. Asad Naeem said

    http://msdn.microsoft.com/en-us/library/office/cc861607.aspx
    there is a method InsertCellInWorksheet()
    In InsertCellInWorksheet(), is it necessary to use worksheet.save();? Is there any way to save worksheet at the end of task?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: