Unable to read exported Excel workbook because of illegal character

After exporting from reporting services, Excel opens the workbook with one of the following errors:

"Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Illegal xml character. Line 1, column 36242."

"Excel found unreadable content in..."

SOLUTION

  1. Rename the Excel workbook from *.xlsx to *.zip
  2. Within the zip file, open the folder \xl\worksheets and open file sheet1.xml in a text editor, i.e. Notepad++
  3. Use Notepad++'s GOTO feature to go to position 36242 and determine the offending character
  4. Replace this character in the report's source, i.e. by using the SQL REPLACE function if the source is a plain SQL query
    Note: this is a reactive measure. Each time the error occurs, the character needs to be determined and replaced

For example, these Unicode characters cannot be part of the value in any field in the report:

code: visual representation
26: [SUB]
191: ¿

OCR a (region of a) PDF using C# and "freeware"

Perform the following steps:

1) Install Ghostscript to make conversion of PDF possible:
https://code.google.com/p/ghostscript/downloads/list

2) Install Imagemagick to convert more easily from PDF to JPG:
http://www.imagemagick.org/script/binary-releases.php#windows

3) Install MODI using Sharepoint Designer 2007, if the Office version on the system is more recent then 2007:
Download Sharepoint Designer 2007
Start setup, custom, disable all
Then select all options under Microsoft Office Document Imaging

By default (on English Windows) only four English-alike languages can be recognized, so install other language packs, e.g. for Japanese or Chinese if needed. In Windows 7 Enterprise or Ultimate this can be done using the "optional updates" from the Windows Update tool In other versions additional languages are not available.

4) From C#, add COM reference to Microsoft Office Document Imaging

5a) Convert a PDF page to JPG from code by executing the following command line syntax:

Whole page:
convert -type grayscale -density 300 jp.pdf[0] jp.jpg

Region of the page:
convert -type grayscale -density 300 jp.pdf[0] -crop 600x600+50+50 jp_crop.jpg

5b) Convert the JPG image to text

Example C# code:
MODI.Document d = new MODI.Document();
d.Create(@"c:\tmp\image\jp_crop.jpg");
d.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);
MODI.Image i = d.Images[0];
Debug.WriteLine(i.Layout.Text.ToString());
Note:
  • If the OCR statement returns a "bad language" error, then install the requested language pack
  • If the OCR statement returns a "document not ready" error, then uninstall and reinstall MODI by using remove programs and rerunning the Sharepoint Designer setup
  • The language of the text that is parsed needs to be known and set in the OCR function

Analysis Services fails with "attribute key cannot be found" on Unicode field

ERROR
The attribute key cannot be found when processing: Table: 'dbo_viu_isb', Column: 'city', Value: ' '. The attribute is 'City'.

OR

The attribute key cannot be found when processing: Table: 'dbo_viu_isb', Column: 'customername', Value: 'パナソニック(株)'. The attribute is 'Customername'.

CAUSE
The problem is caused by the Unicode nature of the SPACE character in the text field in the database. In the field are for example Kanji characters and unicode has about 18 characters for SPACE:

http://www.fileformat.info/info/unicode/category/Zs/list.htm

There is no way to tell Analysis Services or SQL Server the difference between these spaces. In short: collation, case sensitivity, accent sensitivity, width sensitivity or Kana sensitivity have no influence.

WORKAROUND
Convert all the kind of SPACE characters to one SPACE character, e.g:
replace(ad_.ad_name,' ',' ') as ad_name

This appears to do nothing, but the first empty space finds ALL space characters, e.g. ASCII 32 and Unicode 12288 and the second parameter converts it to ONE space (Unicode 32)

Openrowset fails with "network access was interruped"

A SQL Server openrowset query using ODBC to retrieve data from an Excel spreadsheet fails with the error:

"Your network access was interrupted. To continue close the database and then open it again."

Example Openrowset query (Excel 2010+):

SELECT *
FROM OPENROWSET('MSDASQL', 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};Dbq=C:\temp\test.xlsx;Uid=;pwd=;','SELECT * FROM [memberlist$]')

Solution:
  • Disable all network connections and try again. Only works if the frontend (e.g. Management Studio) and the backend (SQL Server database) are on the same machine
  • Change SQL Server service user to an NT account
Remarks:
  • In some cases, activating the network connections again will prevent the error from occurring
  • In some cases, changing the NT account to the "local service" account will prevent the error from occurring
  • The MSDASQL keyword indicates that openrowset will use the ODBC driver in stead of an OLE DB driver