CSV problem with strings in quotation marks

michalradacz · November 15, 2022, 7:02pm

I have problem with CSV tiddlers. In CSV I have proper strings enclosed using quotation marks, but CSV parsing it incorrect using commas as delimiters in strings too. Some ideas to workaround or fixing it?

Please, can someone post it as regular bug issue to TW developers?

jerojasro · November 15, 2022, 10:48pm

pardon the question, but, what prevents you from reporting the bug?

TW_Tones · November 16, 2022, 7:41am

It appears there is insufficient standardisation of the CSVformat, but it implies that commas are either not in the data and only delimit the fields. you may be able to hack your way through going on the fact that ," or ", may always appear together. Other wise the format you use should have appropriate ways to escape different characters in the data.

However if you have access to google sheets or excel try first importing the csv there and saving backout to resolve this (guessing).

I am trying to find a windows app that was perfect for CSV file viewing, preparation and export.

[Edited] I am sure this is not it, but perhaps try it’s free version, it says, “Handles poorly formatted files” https://www.moderncsv.com/ but there are plenty of others, and many should handle @Maurycy’s suggested RFC.

You can write parsers in tiddlywiki without java script starting with splitregexp[\n] to get each line and then break into columns. This is a little gymnastics required by it can be done. I would first type importing it to tiddlers with the JSON Mangle plugin.

Maurycy · November 16, 2022, 7:51am

Actually they are, RFC 4180 states:

Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
  "aaa","b CRLF
  bb","ccc" CRLF
  zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"

If my memory serves me well these two pieces of the standard are supported by any major tool you could use.

I am guessing the CSV tiddlers are parsed incorrectly because they use a simple split or a regular expression, a feat which I don’t think can be achieved without actually parsing the date character-by-character (or using some very advanced regexp features).

jeremyruston · November 16, 2022, 6:19pm

TiddlyWiki currently uses an extremely simple CSV parser that doesn’t observe the rules around double quotes. It is definitely sometimes a frustrating shortcoming that would be worth fixing.

michalradacz · November 16, 2022, 7:02pm

Okay, another point of view: Can you add TSV parser as TSVtiddlers? Tab separated value?

My situation is, we often import tabular data in CSV (but TSV may be better) to TW and want to display as table.

Maurycy · November 16, 2022, 11:02pm

For anyone interested I’ve made a PR with an improved CSV parser - CSV parser improvements by EvidentlyCube · Pull Request #7042 · Jermolene/TiddlyWiki5 · GitHub.

TW_Tones · November 17, 2022, 12:16am

Can you share a small CSV with test data that breaks the current issues you face so we can suggest a workaround?

Good idea to improve the core standard but workarounds are easy in tiddlywiki.

cdaven · November 17, 2022, 6:15am

With @Maurycy’s suggested changes to CsvParser, you could set options.separator = "\t" to use tab as the separator instead. I don’t know how to pass that into CsvParser, though.

michalradacz · November 17, 2022, 9:28am

Great idea, example will be nice.

So, in this test.csv tiddler

https://architektovani.tiddlyhost.com/#test.csv

it is self-describable. And mime-type text/csv is right, because if I put application/csv it will parse JSONmangler plugin. Is this example usefull?

michalradacz · November 17, 2022, 9:30am

Okay, can I test TW with this PR directly?

jeremyruston · November 17, 2022, 9:52am

That wiki has the JSON Mangler plugin installed which I believe is adding support for application/csv

TW_Tones · November 17, 2022, 9:55am

Thanks. I will have a look tomorrow.

Some times such exceptions are not common so a quick search and replace eg commas to semi colons can save the day.

Similarly you can look at the way you obtain the csv and see if there is another export such as the tab delimited form.

If every column is wrapped in quotes then the delimiter comas will always be “value '”,"’’ value"and the others inside quotes.

Not withstanding the above we need a robust standards complient csv

Maurycy · November 17, 2022, 10:24am

@michalradacz I have put the changes from the PR on Tiddlyhost - you can test it here https://new-csv-parser-pr.tiddlyhost.com/

That CSV should be handled easily by the new parser!

Maurycy · November 17, 2022, 10:26am

It seems the mimetype for tab delimited CSV is text/tab-separated-values excel - What is the best mime-type and extension to use when exporting tab delimited? - Stack Overflow

I think what would make most sense would be to just register that mimetype to run with the same code, just different separator. I’ll make these changes to the PR later today!