I have problem with CSV tiddlers. In CSV I have proper strings enclosed using quotation marks, but CSV parsing it incorrect using commas as delimiters in strings too. Some ideas to workaround or fixing it?
Please, can someone post it as regular bug issue to TW developers?
It appears there is insufficient standardisation of the CSVformat, but it implies that commas are either not in the data and only delimit the fields. you may be able to hack your way through going on the fact that ," or ", may always appear together. Other wise the format you use should have appropriate ways to escape different characters in the data.
However if you have access to google sheets or excel try first importing the csv there and saving backout to resolve this (guessing).
I am trying to find a windows app that was perfect for CSV file viewing, preparation and export.
[Edited] I am sure this is not it, but perhaps try it’s free version, it says, “Handles poorly formatted files” https://www.moderncsv.com/ but there are plenty of others, and many should handle @Maurycy’s suggested RFC.
You can write parsers in tiddlywiki without java script starting with splitregexp[\n] to get each line and then break into columns. This is a little gymnastics required by it can be done. I would first type importing it to tiddlers with the JSON Mangle plugin.
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
If my memory serves me well these two pieces of the standard are supported by any major tool you could use.
I am guessing the CSV tiddlers are parsed incorrectly because they use a simple split or a regular expression, a feat which I don’t think can be achieved without actually parsing the date character-by-character (or using some very advanced regexp features).
TiddlyWiki currently uses an extremely simple CSV parser that doesn’t observe the rules around double quotes. It is definitely sometimes a frustrating shortcoming that would be worth fixing.
With @Maurycy’s suggested changes to CsvParser, you could set options.separator = "\t" to use tab as the separator instead. I don’t know how to pass that into CsvParser, though.
I think what would make most sense would be to just register that mimetype to run with the same code, just different separator. I’ll make these changes to the PR later today!