CSV problem with strings in quotation marks

I have problem with CSV tiddlers. In CSV I have proper strings enclosed using quotation marks, but CSV parsing it incorrect using commas as delimiters in strings too. Some ideas to workaround or fixing it?

Please, can someone post it as regular bug issue to TW developers?

pardon the question, but, what prevents you from reporting the bug?

It appears there is insufficient standardisation of the CSVformat, but it implies that commas are either not in the data and only delimit the fields. you may be able to hack your way through going on the fact that ," or ", may always appear together. Other wise the format you use should have appropriate ways to escape different characters in the data.

However if you have access to google sheets or excel try first importing the csv there and saving backout to resolve this (guessing).

I am trying to find a windows app that was perfect for CSV file viewing, preparation and export.

[Edited] I am sure this is not it, but perhaps try it’s free version, it says, “Handles poorly formatted files” https://www.moderncsv.com/ but there are plenty of others, and many should handle @Maurycy’s suggested RFC.

You can write parsers in tiddlywiki without java script starting with splitregexp[\n] to get each line and then break into columns. This is a little gymnastics required by it can be done. I would first type importing it to tiddlers with the JSON Mangle plugin.

Actually they are, RFC 4180 states:

  1. Fields containing line breaks (CRLF), double quotes, and commas
    should be enclosed in double-quotes. For example:
  "aaa","b CRLF
  bb","ccc" CRLF
  zzz,yyy,xxx
  1. If double-quotes are used to enclose fields, then a double-quote
    appearing inside a field must be escaped by preceding it with
    another double quote. For example:
"aaa","b""bb","ccc"

If my memory serves me well these two pieces of the standard are supported by any major tool you could use.

I am guessing the CSV tiddlers are parsed incorrectly because they use a simple split or a regular expression, a feat which I don’t think can be achieved without actually parsing the date character-by-character (or using some very advanced regexp features).

TiddlyWiki currently uses an extremely simple CSV parser that doesn’t observe the rules around double quotes. It is definitely sometimes a frustrating shortcoming that would be worth fixing.

Okay, another point of view: Can you add TSV parser as TSVtiddlers? Tab separated value?

My situation is, we often import tabular data in CSV (but TSV may be better) to TW and want to display as table.

For anyone interested I’ve made a PR with an improved CSV parser - CSV parser improvements by EvidentlyCube · Pull Request #7042 · Jermolene/TiddlyWiki5 · GitHub.

1 Like

Can you share a small CSV with test data that breaks the current issues you face so we can suggest a workaround?

  • Good idea to improve the core standard but workarounds are easy in tiddlywiki.

With @Maurycy’s suggested changes to CsvParser, you could set options.separator = "\t" to use tab as the separator instead. I don’t know how to pass that into CsvParser, though.

Great idea, example will be nice.

So, in this test.csv tiddler

https://architektovani.tiddlyhost.com/#test.csv

it is self-describable. And mime-type text/csv is right, because if I put application/csv it will parse JSONmangler plugin. Is this example usefull?

1 Like

Okay, can I test TW with this PR directly?

That wiki has the JSON Mangler plugin installed which I believe is adding support for application/csv

Thanks. I will have a look tomorrow.

Some times such exceptions are not common so a quick search and replace eg commas to semi colons can save the day.

Similarly you can look at the way you obtain the csv and see if there is another export such as the tab delimited form.

If every column is wrapped in quotes then the delimiter comas will always be “value '”,"’’ value"and the others inside quotes.

Not withstanding the above we need a robust standards complient csv

@michalradacz I have put the changes from the PR on Tiddlyhost - you can test it here https://new-csv-parser-pr.tiddlyhost.com/

That CSV should be handled easily by the new parser!

It seems the mimetype for tab delimited CSV is text/tab-separated-values excel - What is the best mime-type and extension to use when exporting tab delimited? - Stack Overflow

I think what would make most sense would be to just register that mimetype to run with the same code, just different separator. I’ll make these changes to the PR later today!