Writing BOM to csv from KiCad 6 creates garbled text with µ character

flum · January 24, 2022, 8:03pm

I use the character ‘µ’ as part of component values, such as “10µ”. When creating the .csv file using “bom_csv_grouped_by_value_with_fp”, it returns “10Ã‚Âµ” as output. I know this has to do with the encoding in the definition of “fromNetlistText”, but I don’t know what to use to make this work.

When I change the following code line:

return aText.encode(‘utf-8’).decode(‘cp1252’)

to

return aText.encode(‘latin-1’).decode(‘cp1252’)

I get “10Âµ”. It got me a bit closer to the solution but obviously it’s still incorrect.

I am on Win 10. Anyone has dealt with this in the past and can help me getting the proper encoding?

retiredfeline · January 24, 2022, 11:43pm

What happens if you change it to just:

return aText.encode(‘utf-8’)

Not sure why output should even be in cp1252.

flum · January 25, 2022, 9:49am

That was one of my first thoughts but when I tried that, it barfed:

refs += fromNetlistText(component.getRef())
TypeError: can only concatenate str (not “bytes”) to str

flum · January 25, 2022, 9:54am

BTW, here’s the complete code that I am using:

# Import the KiCad python helper module and the csv formatter
import kicad_netlist_reader
import kicad_utils
import csv
import sys
import re

# A helper function to convert a UTF8/Unicode/locale string read in netlist
# for python2 or python3
def fromNetlistText( aText ):
    if sys.platform.startswith('win32'):
        try:
            return aText.encode('utf-8').decode('cp1252')
        except UnicodeDecodeError:
            return aText
    else:
        return aText

# Generate an instance of a generic netlist, and load the netlist tree from
# the command line option. If the file doesn't exist, execution will stop
net = kicad_netlist_reader.netlist(sys.argv[1])

# Open a file to write to, if the file cannot be opened output to stdout instead
try:
    # f = kicad_utils.open_file_write(sys.argv[2], 'w')
    f = open(sys.argv[2], 'w', newline='')
except IOError:
    e = "Can't open output file for writing: " + sys.argv[2]
    print(__file__, ":", e, sys.stderr)
    f = sys.stdout

# Create a new csv writer object to use as the output formatter
out = csv.writer(f, delimiter=',', quotechar='\"', quoting=csv.QUOTE_ALL)

# Output a set of rows for a header providing general information
out.writerow(['Reference', 'Quantity', 'Value', 'Package', 'MPN', 'Vendor'])

# Get all of the components in groups of matching parts + values
# (see ky_generic_netlist_reader.py)
grouped = net.groupComponents()

# Output all of the component information
for group in grouped:
    refs = ""

    # Add the reference of every component in the group and keep a reference
    # to the component so that the other data can be filled in once per group
    length = len(group)
    ctr = 0
    for component in group:
        refs += fromNetlistText(component.getRef())
        ctr = ctr + 1
        if ctr < length:
            refs += ", "
        c = component

    if(c.getField("Populate") != "0"):
        # Fill in the component groups common data
        footprint = c.getFootprint()
        footprint = footprint.split(':', 1)[-1]
        
        out.writerow(
            [refs, 
            len(group),
            fromNetlistText( c.getValue() ),
            fromNetlistText( footprint ),
            fromNetlistText( c.getField("MPN") ),
            fromNetlistText( c.getField("Vendor") )])

John_Pateman · January 25, 2022, 9:55am

Try putting

# coding=utf-8

at the top of the file, after the shebang.

And, in general, don’t complicate your BOM with unnecessary fanciness that breaks things

flum · January 25, 2022, 11:16am

Tried that. No dice. The conversion still gives me “10Ã‚Âµ”.

Note: there’s no shebang in the first line of the original code.

# coding=utf-8

# Import the KiCad python helper module and the csv formatter
import kicad_netlist_reader
import kicad_utils
import csv
import sys
import re

# A helper function to convert a UTF8/Unicode/locale string read in netlist
# for python2 or python3
def fromNetlistText( aText ):
    if sys.platform.startswith('win32'):
        try:
            return aText.encode('utf-8').decode('cp1252')
            # return aText.encode('utf-8')
        except UnicodeDecodeError:
            return aText
    else:
        return aText

retiredfeline · January 25, 2022, 11:48am

Ah ok, the encode method returns bytes, not a string so cannot be mixed with string. And the function fromNetlistText() seems to recode the string to cp1252 only when you are running on win32. cp1252 is practically the same as latin1 except for a particular range of characters so encode(‘latin1’).decode(‘cp1252’) is a no-op, at least for the µ character, which means that Âµ is the actual contents of that string. If you look here: UTF-8 Character Debug Tool, you will see that C2 B5 is the UTF-8 encoding for B5 which is µ.

A better method is probably to 1. Don’t do the conversion, it may not be needed any more on W10. Change the function to always return aText (just for testing), in other words, a no-op passthrough function. If that works and you can read the file containing UTF-8 on W10, then you are done. If not, then also 2. Change the output file open to include a encoding charset of utf-8, as shown here: How to read and write unicode (UTF-8) files in Python?

I notice that you commented out kicad_utils.open_file_write and replaced it with a Python open. It could be that kicad_utils.open_file_write specified this utf-8 encoding but not your modification.

flum · January 25, 2022, 6:10pm

Thanks a lot for disecting the code. I followed your suggestions, read the links and tried a number of permutations but still couldn’t get it to work. The reason I skipped kicad_utils.open_file_write was to get rid of the extra newlines (every other line was empty in Excel). Maybe I need to go back and convert all the µ’s to u’s which would be a shame.

# Import the KiCad python helper module and the csv formatter
import kicad_netlist_reader
import kicad_utils
import csv
import sys
import re

# A helper function to convert a UTF8/Unicode/locale string read in netlist
# for python2 or python3
def fromNetlistText( aText ):
    if sys.platform.startswith('win32'):
        try:
            # return aText.encode('utf-8').decode('cp1252')
            return aText
        except UnicodeDecodeError:
            return aText
    else:
        return aText

# Generate an instance of a generic netlist, and load the netlist tree from
# the command line option. If the file doesn't exist, execution will stop
net = kicad_netlist_reader.netlist(sys.argv[1])

# Open a file to write to, if the file cannot be opened output to stdout
# instead
try:
    # f = kicad_utils.open_file_write(sys.argv[2], 'w')
    f = open(sys.argv[2], 'w', newline='', encoding='utf-8')
except IOError:
    e = "Can't open output file for writing: " + sys.argv[2]
    print(__file__, ":", e, sys.stderr)
f = sys.stdout

flum · January 25, 2022, 6:23pm

BTW, this is all that kicad_utils.open_file_write does. There’s literally nothing else in there:

#
# KiCad python module for some helper functions
#

import os

def open_file_write(path, mode):
    ''' Open "path" for writing, creating any parent directories as needed.
    '''
    dir_path = os.path.dirname(path)

    if not os.path.isdir(dir_path):
        os.makedirs(dir_path)

    return open(path, mode)

retiredfeline · January 25, 2022, 10:08pm

You didn’t say what you get in the output file. Ideally you should examine it with a binary file viewer. Also CSV is just a text format so it should be possible to look at it with a text editor rather than Excel. Then too it depends on how the application treats the file, whether it’s expecting UTF-8 or Latin1. To make csv.writer generate \n instead of \r\n, use the Dialect.lineterminator parameter of csv.writer.

Ultimately it’s your fabricator who has to read the CSV file so they have to be able to accept µ in the charset if they specify one.

Edit: Just want to add that there isn’t an attribute on a file that says it contains UTF-8 or Latin1 or some other charset. (The closest thing is the BOM (byte order mark) but that’s normally for UTF-16 on Windows.) It’s all by agreement (or not) between the generating applicaiton and the reading application. So check what charset your Excel is expecting and also what your text editor is expecting. For example when I import a CSV file into Libreoffice Calc this is the opening dialogue.

That’s also why I mentioned inspecting the file with a binary viewer, and also checking with your fabricator if they will handle UTF-8 and extended characters.

HSPalm · January 26, 2022, 12:11pm

Did you encounter this bug?

At least a fix has been commited.

flum · January 28, 2022, 2:35pm

Thanks for the hint with the line terminator. After adding it to the code, I was able to go back to the original call:

f = kicad_utils.open_file_write(sys.argv[2], 'w')

A hex editor gives me the following result (seems like I cannot show pictures here):

22 31 30 C3 82 C2 B5 22 "10Ã‚Âµ"

My PCBA manufacturer can read my files. It has worked just fine with Eagle where I have also used µ.

So the code currently looks like this:

Import the KiCad python helper module and the csv formatter
import kicad_netlist_reader
import kicad_utils
import csv
import sys
import re

# A helper function to convert a UTF8/Unicode/locale string read in netlist
# for python2 or python3
def fromNetlistText( aText ):
    if sys.platform.startswith('win32'):
        try:
            return aText.encode('utf-8').decode('cp1252')
            # return aText
        except UnicodeDecodeError:
            return aText
    else:
        return aText

# Generate an instance of a generic netlist, and load the netlist tree from
# the command line option. If the file doesn't exist, execution will stop
net = kicad_netlist_reader.netlist(sys.argv[1])

# Open a file to write to, if the file cannot be opened output to stdout
# instead
try:
    f = kicad_utils.open_file_write(sys.argv[2], 'w')
except IOError:
    e = "Can't open output file for writing: " + sys.argv[2]
    print(__file__, ":", e, sys.stderr)
    f = sys.stdout

# Create a new csv writer object to use as the output formatter
out = csv.writer(f, delimiter=',', quotechar='\"', quoting=csv.QUOTE_ALL, lineterminator="\n")

# Output a set of rows for a header providing general information
# out.writerow(['Source:', net.getSource()])
# out.writerow(['Date:', net.getDate()])
# out.writerow(['Tool:', net.getTool()])
# out.writerow( ['Generator:', sys.argv[0]] )
# out.writerow(['Component Count:', len(net.components)])
out.writerow(['Reference', 'Quantity', 'Value', 'Package', 'MPN', 'Vendor'])


# Get all of the components in groups of matching parts + values
# (see ky_generic_netlist_reader.py)
grouped = net.groupComponents()

# Output all of the component information
for group in grouped:
    refs = ""

    # Add the reference of every component in the group and keep a reference
    # to the component so that the other data can be filled in once per group
    length = len(group)
    ctr = 0
    for component in group:
        refs += fromNetlistText(component.getRef())
        ctr = ctr + 1
        if ctr < length:
            refs += ", "
        c = component

    if(c.getField("Populate") != "0"):
        # Fill in the component groups common data
        footprint = c.getFootprint()
        footprint = footprint.split(':', 1)[-1]
        
        out.writerow(
            [refs, 
            len(group),
            fromNetlistText( c.getValue() ),
            fromNetlistText( footprint ),
            fromNetlistText( c.getField("MPN") ),
            fromNetlistText( c.getField("Vendor") )])

retiredfeline · January 28, 2022, 8:10pm

Your string has been double encoded. Using the debug tool page again, looking at the hex sequence: C3 82 C2 B5 gives: Ã‚Âµ. Now when we reverse the last encoding due to the .decode(‘cp1252’), that means before that it was C2 B5 which gives Âµ, which is in fact the UTF-8 encoding for the µ character.

So I would again make fromNetlistText the identity transformation (always return aText) and the output should be correct UTF-8.

Now you have to make sure that any application you use to look at the CSV is expecting UTF-8. Your Excel might be expecting CP1252 or Latin1 (almost the same thing) which would make the result look invalid. To make Excel expect UTF-8, here’s a page explaining it. But I wouldn’t follow that, it seems to require poking in the registry or writing a byte order mark at the beginning of the file. Instead, don’t use Excel to look at the CSV to confirm that the file contents are correct, use some other tool, like Libreoffice Calc. Or the hex editor again.

To summarise, you do not need the fromNetlistText function in an all UTF-8 world, and you should use tools that display UTF-8 correctly, i.e. not Excel.

flum · January 28, 2022, 8:29pm

OK, got it.

I found out that Excel can recognize UTF-8 if told so. After entering “Get Data From Text” in Excel’s search field, a wizard pops up that allows customizing the CSV reader.

Trying to make the BOM writer to write CP1252 instead of UTF-8 didn’t work. In any case, I am not going to pursue this any further. Using the wizard is good enough for me.

@retiredfeline, thanks for your help.

system · April 28, 2022, 8:29pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.