Gajim - 2016-04-09


  1. tmolitor asterix: I added a comment to bug #8090 :)
  2. bot RSS: Feeds for Gajim • Ticket #8090 (MAM message duplication) updated Apparently this bug doesn't happen anymore. Maybe the message deduplication I added solved this issue :) @Asterix: could this be true? Are messages received via MAM also deduplicated when the history is written? https://trac.gajim.org/ticket/8090#comment:2
  3. Marzanna Hello. Are there any trac.gajim.org admins?
  4. Marzanna I want to file a bug but get this message: Submission rejected as potential spam IP 46.166.138.158 blacklisted by all.s5h.net [2] SpamBayes determined spam probability of 78.16%
  5. Link Mauve Marzanna, Asterix isn’t here right now, but he’ll likely read your message when he’ll be back.
  6. Marzanna Link Mauve, thanks. I'll wait.
  7. Link Mauve He’s usually here during CEST evenings.
  8. mpan Marzanna, maybe try using your own address instead of VPN? Anonymizing VPN’s addresses always have a much higher chance of ending up on blocklists, for obvious reasons.
  9. Marzanna mpan, ok. I added trac.gajim.org to temporary exception. But blocking VPNs is no good because VPN (and proxies) is the only way to browse uncensored Internet
  10. mpan Marzanna, no one is blocking VPNs intentionally. It’s just that people use VPNs for attacks, so they’re blocked. The decision had nothing to do with the fact that it’s VPN.
  11. bot RSS: Feeds for Gajim • Ticket #8321 (Url Image preview displays gray rectangles) created Bug description I see gray rectangles in many cases instead of preview images. I tried to load an image from a disk into gtk.gdk.Pixbuf and TextViewImage?. Same result. Gray rectangles. Software versions OS version: Xubuntu 15.10 GTK version: 2.24.28 PyGTK version:2.24.0 https://trac.gajim.org/ticket/8321
  12. mpan usenix.org.uk has just noticed that lots of spam is coming from this IP address, so it has been blocked. If you decide to share your IP address with people, who cause problems, then — obviously — you will get into problems too :)
  13. Marzanna mpan, I understand, but sometimes I face unexpected IP blocking. It irritates me.
  14. mpan It’s totally expected if you use a VPN.
  15. mpan (until it’s your own, private VPN)
  16. Marzanna I'd used my private VPN server, but hosting provider can't fix my domain name :(
  17. mpan Let me put it like that: there is a cafe. Each evening a folk clothed in pink goes out of it and throws stones at the building opposite, breaking windows. One day you put pink clothes on yourself and get out of the cafe at the evening… what do you expect? :>
  18. mpan It’s not that no one likes cafes. It’s just that everyone knows that a pink guy leaving this particular place means problems.
  19. Marzanna Well, it's one of the reasons I want to use private VPN server.
  20. Marzanna I believe ipv6 should solve the problem with all these ip bans.
  21. mpan Not really. IPv6 is assigned in groups too. So one would just rangeban.
  22. mpan Instead of banning a subgroup of 256 addresses, for example, one will have to ban 65536 of them
  23. vorner Which already happened to me. My VPS provider got its IPv6 range on some serious spam blacklist (one that bounces mails directly on many places), because some other customer sent spam from his server.
  24. vorner People seem to ban IPs on IPv4, but are more than happy to block /48 ranges on IPv6
  25. Marzanna Oh. how I hate spammers. They broke my pink dream about ipv6 :C
  26. euan Hi guys, I need some help with Gajim 0.16.5
  27. euan I still have an issue with unicode emoticons
  28. euan I can see them sent from Converstaions (Android client)
  29. euan but I can only send a few from Twemoji
  30. euan only the ones defined in unicode version 5.0 and back
  31. euan whilst the vast majority are unicode v6 and above
  32. euan I'm on Slackware
  33. euan 14.1
  34. euan I'm using the SlackBuild from SlackBuilds.org and everything else seems to work well
  35. euan tmolitor helped me with this issue a few weeks ago, his is working on Debian (Jessie I beleive)
  36. mpan 6:💱👭, 7:👁🗯, 8:🏺🏑
  37. mpan How the problem appears to you, euan?
  38. euan which is roughly same version of python2 and unidecode was exactly same version of unicodedata (python builtin support)
  39. euan well, i can see emoticons just fine
  40. euan but I can't post most of them
  41. mpan Yes, but what do you mean by “can’t post”? What exactly happens?
  42. euan it just puts a blank where the emoticon should be, it doesn't send the unicode characters
  43. euan first let check I', using Twemoji resized, brb
  44. euan let me check^
  45. mpan Can you echo now what I’ve sent?
  46. euan 0⃣
  47. euan can you see that blue background white Zero foreground?
  48. euan Actually I can't see your emoticons, just unicode jargon
  49. euan but it usually works, I can usually see others fine
  50. euan what emoticons pack are you using?
  51. mpan No. I can see latin 0 enclosed in U+20E3.
  52. euan hmm, strange, problem is worse now
  53. mpan Can you echo what I’ve sent you?
  54. euan I get an error trying to copy the unicode
  55. euan :
  56. euan see, it's blank
  57. mpan What error? -.-
  58. euan 6:, 7:, 8:
  59. mpan Well… you do realize that it’s Unicode 6 too?
  60. euan i copied and pasted the whole line
  61. euan it misses out the unicode part
  62. euan let me restart, this is behaving differntly than before
  63. euan ok, i'm back
  64. euan I'll try to send a Twelmoji Resized emoticon
  65. euan 0⃣
  66. euan that one I can see in my chat window
  67. euan it's a zero
  68. euan can you see it?
  69. mpan There is no such glyph in Unicode as “Twelmoji Resized”.
  70. euan What emoticons pack are you using?
  71. mpan And I see, again, latin 0 enclosed in U+20E3
  72. mpan What is “emoticons pack”? -.-
  73. euan there are 2 types of emoticons
  74. euan text based, and unicode based
  75. mpan Unicode is text.
  76. euan AFAIK Twemoji is the only emoticons pack fro Gajim that uses unicode
  77. mpan What is “emoticons pack”?
  78. euan OK, so when I say text based I mean is uses ASCII that mans something
  79. euan e.g colon+hyphen+right bracket is a smiley face
  80. mpan Ok. Fine.
  81. mpan Now, there are also emojis in Unicode.
  82. mpan But I don’t see what you mean by “emoticons pack”.
  83. mpan You mean “font”?
  84. euan but with unicode and single Unicode character reprsents an emoticon
  85. mpan (or “typeface”)
  86. euan Install the emoticons plugin?
  87. euan then there are several 'packs' available
  88. mpan What is this plugin doing? The description says nothing.
  89. euan you can change them in Edit->Preferences->Chat Appearance->Emoticons
  90. euan it installs additional emoticon packs
  91. mpan … what are “emoticon packs”? -.-
  92. euan the default emoticon pack is called "static" and there are very few emoticons
  93. mpan Anyway, nvm. I was thinking you have a problem with Gajim. I don’t know how the third-party plugin works and what it does, so I can’t help in this case anyway.
  94. euan well, I don't think its a problem with the plugin
  95. euan I think it's Gajim problem, nut onlky shows itself when using unicode emoticons
  96. mpan Well… first of all it adds some strange concept of “emoticon packs” that has nothing to do with Unicode. So, obviously, it is.
  97. euan but only^
  98. mpan But, as I said, I can’t help with it — ­I don’t know the plugin. Sorry.
  99. euan Look, I'm telling you, there are 2 types of emoticons
  100. euan ASCII ones and Unicode ones
  101. mpan Yes.
  102. euan I also learnt this recently
  103. euan I ahd no idea there was Unicode emoticons until recently
  104. euan OK, thanks anyway
  105. mpan Well… I’m using them for few years, since Unicode 6.0. Never heard of anything named “emoticon pack”.
  106. mpan Hence it’s something related to this plugin only.
  107. mpan I have no such plugin, and have exactly zero problems with using emojis: be it Unicode 6, 7 or 8.
  108. mpan Neither in Gajim, nor in any other software with which I’ve ever tried to use the emojis.
  109. euan it's just a term to mean a bunch of emoticons (set of icon images)
  110. mpan Hence, I can’t help you, because I don’t know the plugin. Never used it, and don’t know why it breaks the things.
  111. euan NO, many chat software let's you change the emoticons
  112. euan emoticons look different on different chat programs
  113. euan because they have there own image representation
  114. mpan Well… Since I’m using only Gajim and irssi for chatting, I can’t tell much about other chatting software — indeed. But they’ll look exactly the same everywhere for one simple reason: I have only one typeface that supports Unicode ranges related to emoji (DejaVu).
  115. euan Please go to Edit->Preferences->Chat Appearance->Emoticons
  116. euan Sorry: dit->Preferences->General->Chat Appearance->Emoticons
  117. mpan “Disabled”, as enabling this replaces emoticons with some crappy images
  118. euan eh?
  119. mpan I use Unicode for that, don’t need images that distract me when reading or break the text flow.
  120. euan emoticons are crappy images, what planet have you been living on?
  121. euan OK, so I see what you mean now
  122. mpan But, finally, it seems we got some point: you don’t use unicode emoticons, but graphical ones.
  123. euan you use the emoticon rendered by your font
  124. mpan Of course I do. And since you said “unicode” I was sure you do too…
  125. euan I want the emoticons rendered as an image
  126. euan as in every other modern chat client usually does
  127. euan the default for Conversations uses the Android 'emoticons pack' (set of images)
  128. mpan Well… my modern irssi IRC client doesn’t even have such feature. But put it aside. I’ve set it temporarily to “static”
  129. euan well, Twemoji takes those unicode and converts them to an image
  130. euan instead of letting the font render it
  131. mpan 6:💱👭, 7:👁🗯, 8:🏺🏑
  132. mpan Just sent this with graphical emoticons enabled.
  133. euan I can't see those, not even in my font
  134. mpan (also: they were rendered by the font engine anyway)
  135. euan I get the unicode code in a long rectangle
  136. euan can you post simple smiley face?
  137. mpan
  138. euan aha, got that one
  139. Link Mauve euan, install ttf-symbola.
  140. euan as an image
  141. Link Mauve Or however your distribution calls that font.
  142. mpan Well… rendered as text for me
  143. mpan ☹☺
  144. mpan “tango” scheme — still text
  145. euan ‎@Link Mauve‎: OK, I'll try that
  146. mpan or DejaVu
  147. euan OK, so the first image "unhappy face" was rendered by font
  148. Link Mauve mpan, DejaVu doesn’t contain most of the smileys of the astral plane, AFAIK.
  149. euan OK, so the first one (unhappy face) was rendered by font, the second was an icon by Twemoji
  150. euan I have DejaVu installed
  151. mpan well… what is twemoji?
  152. euan I tried that last time, made no differnce
  153. mpan Maybe it’s the problem?
  154. euan Twemoji is a set of icon images that represent the unicode
  155. euan they are bigger, and in colour nad much more detailed than the font rendered ones
  156. euan well, the problem is that whilst I can see the ones other people post
  157. euan I can't post them myself
  158. Link Mauve euan, just to be sure, your python2 is compiled on wide Unicode, right?
  159. euan for example, this is a 0,1,2,3,4,5: 0⃣ 1⃣ 2⃣ 3⃣ 4⃣ 5⃣
  160. euan those I can post
  161. euan but when I post others, it is blank
  162. Link Mauve UCS-2 isn’t enough to represent the astral plane.
  163. Link Mauve Check how your python2 binary was compiled.
  164. euan I'll post one between the pointed brackets > <
  165. euan see, it's blank
  166. euan that was a face
  167. euan Aha, so Iwas going to ask, is there compile time options for Gajim
  168. Link Mauve Nope.
  169. mpan Link Maeda: just a side note: for me DejaVu misses only few symbols in “Miscellanous symbols” (which indeed are patched by Symbola or other font on my system).
  170. euan this is the following from SlackBuild: ./configure \ --prefix=/usr \ --libdir=/usr/lib${LIBDIRSUFFIX} \ --sysconfdir=/etc \ --localstatedir=/var \ --mandir=/usr/man \ --docdir=/usr/doc/$PRGNAM-$VERSION \ --build=$ARCH-slackware-linux make make install DESTDIR=$PKG
  171. mpan Link Mauve,*, pardon
  172. Link Mauve euan, for python2 ?
  173. Link Mauve euan, for python2?
  174. mpan euan, is this python?
  175. euan @mpan: how do you enter your emoticons
  176. euan I use the button bottom left
  177. euan but with "static" it does not post unicode
  178. mpan euan, I sparesely use them in the first place. But if I do, then either by keyboard, or just copy them from “Characters map” application.
  179. euan OK, so python2 is built into Slackware
  180. euan it's not in a repo
  181. mpan Yes, but what are python2’s compile-time options?
  182. euan I can find the SlackBuild on the source packages though
  183. mpan euan: `python2 -m sysconfig`
  184. mpan (the output may reveal some private information, so first review it before posting)
  185. euan for python2: ./configure \ --prefix=/usr \ --libdir=/usr/lib${LIBDIRSUFFIX} \ --mandir=/usr/man \ --docdir=/usr/doc/python-$VERSION \ --with-threads \ --enable-ipv6 \ --enable-shared \ --build=$ARCH-slackware-linux make $NUMJOBS || make || exit 1 make install DESTDIR=$PKG
  186. mpan CONFIG_ARGS is what we’re interested in
  187. euan Wow, a lot of information
  188. euan OK
  189. mpan Does it say --enable-unicode=ucs2 or something else?
  190. euan OK, her eit is: python2 -m sysconfig | grep CONFIG_ARGS CONFIG_ARGS = "'--prefix=/usr' '--libdir=/usr/lib64' '--mandir=/usr/man' '--docdir=/usr/doc/python-2.7.5' '--with-threads' '--enable-ipv6' '--enable-shared' '--build=x86_64-slackware-linux' 'build_alias=x86_64-slackware-linux'"
  191. Link Mauve euan, Py_UNICODE_SIZE is the one we want to know.
  192. Link Mauve In -m sysconfig.
  193. euan OK, coming up
  194. euan Py_UNICODE_SIZE = "2"
  195. Link Mauve Alternatively, you could use Gajim’s unreleased default branch, which uses python3 (so no Unicode issue) and gtk3.
  196. mpan Will be 2, considering that CONFIG_ARGS doesn’t overwrite it
  197. mpan Yep
  198. Link Mauve euan, ok, it should be 4 if you want to use the astral plane.
  199. Link Mauve Recompile python2, or switch to Gajim’s default branch.
  200. euan Aha, OK
  201. mpan So the culprit is found.
  202. euan but it's the python version that's too low (old)?
  203. euan tmolitor was OK in Jessie
  204. mpan No, the compilation options of Python2 are wrong.
  205. Link Mauve Misconfigured, yes.
  206. euan his was similar version of python2
  207. mpan Well… “wrong” as in “no one should use UCS-2 anymore”.
  208. mpan euan: not about version, but how it was compiled.
  209. euan OK, so Slacjkware 14.1 came out in Nov 2013 I think
  210. euan let me check
  211. euan yes, Nov 2013
  212. mpan euan: “UCS-2 in 2016” is pretty much “no Unicode support”. I’m exaggerating a bit here, but you should get the image.
  213. euan so was it "wrong' in Nov 2013?
  214. euan yes, i sort of follow you
  215. euan i'll have to research UCS-4 vs UCS-4
  216. euan but this is great news to me
  217. euan I'm getting somewhere now
  218. euan thanks vfery much so far
  219. mpan euan: it was wrong in 1995… ;)
  220. euan So slackware is about to release 14.2 very soon
  221. euan I think I should let them know about this issue before they release
  222. euan unless they already fixed it
  223. mpan Link Mauve, does gajim work on Python3?
  224. Link Mauve mpan, only the default branch.
  225. Link Mauve Which isn’t fully finished yet.
  226. euan So how can I remedy ,I can rebuilt it after modifying the SlackBuild?
  227. Link Mauve But will be finished sooner if more people use it and report bugs. :)
  228. mpan euan: unfortunely you can’t without rebuilding python2 and using the custom version with Gajim.
  229. euan I want to use this in production oin a Linux based rollout, I need stable tested version only
  230. euan I've tested 0.16.5 for weeks and only emotcons is an issue
  231. euan I really want Twemoji to work
  232. euan it's much better between COnverstaions and Gajim if I use Twemoji
  233. mpan euan: and the difference between UCS-2 and any other unicode-related encoding is that UCS-2 can support only 2¹⁶ glyphs, while Unicode has much, much more.
  234. arune euan: it doesn't work in windows either if that makes you feel any bettee
  235. euan I'm not big on emotes myself ,but my Wife uses them a lot, as do plenty of my users
  236. euan Yes, i'm willing to rebuilt python2
  237. euan I'm a slacker, I'm used to compiiling from source ;-)
  238. mpan euan, surprisingly Microsoft dropped UCS-2 for UTF-16 after 2000. I’ve learned this just yesterday. So it’s veeeery sad a Linux distro uses UCS-2 for anything.
  239. mpan arune, surprisingly Microsoft dropped UCS-2 for UTF-16 after 2000. I’ve learned this just yesterday. So it’s veeeery sad a Linux distro uses UCS-2 for anything.
  240. mpan euan, the above was for arune, not u :)
  241. euan I got that
  242. euan Well,, I'm surprised, Slackware is the highest quality distro I'm used so far
  243. euan tried a lot
  244. euan and made a distro myself
  245. euan so i wonder why it hasn't been an issue for many others
  246. euan I'll have to ask the Slackeware devs why they use UCS-2
  247. mpan euan: the problem is that UCS-2 can support only characters with codes U+0000 to U+FFFF, while the emoticons start at U+1F300. This is all you need to know (you wanted to investigate the difference. I’ve explained everything now, so no need to dive into the subject more :P).
  248. euan OK, so what do I need for full Unicode?
  249. mpan euan, rebuilding python2 with --enable-unicode=ucs4
  250. euan OK, I'll try that
  251. euan what package provides unicode itself?
  252. euan python aside
  253. euan linux package I mean
  254. euan or project rather
  255. mpan There is no such thing as “unicode itself”. It’s just about how much memory is allocated per character.
  256. euan OK, so it's purely done in the python runtime
  257. euan no external lib
  258. euan OK, cool, I'll try to rebuilt python
  259. euan brb in 5-10 mins
  260. mpan The reason for using UCS-2 may be a belief that 32 bits per character consumes “too much” memory. Nonsense in 2016, but well… some beliefs die hard. UTF-16, which is 16-bit, consumes the same as UCS-2 for most characters, but in some scenarios may cause problems if handled by an unexperienced programmer, as it’s really a variable-length encoding, which is not obvious for beginners :).
  261. mpan For me UTF-8 should be the standard for general-purpose uses, but it’s just my opinion.
  262. euan well Slackware does do some things old school, LILO for boot and no PAM
  263. euan but otherwise, it's modern
  264. Link Mauve mpan, actually it’s true.
  265. mpan Link Mauve, which one?
  266. Link Mauve That’s why since 3.3, Python started using ASCII if the string didn’t contain any higher character.
  267. Link Mauve mpan, that UTF-32 is large.
  268. mpan If one wants to have fixed-width encoding and unicode I see no other option.
  269. Link Mauve Python 3.3 scans the string before encoding it to find if it can fit in latin1 or not, and if so it divides the memory footprint by four.
  270. mpan Personally I’m believing that fixed-width encodings are archaic idea, hence my support for UTF-8, but some people want them… so UTF-32.
  271. Link Mauve That happens without the user needing to be aware of it.
  272. mpan However, they have choosen using multiple encodings. It’s fine too. Same as Shift-JIS for Japan is fine.
  273. Link Mauve Ugh, no, it isn’t fine. :x
  274. Link Mauve Especially since you can never know whether it’s Shift_JIS or CP932.
  275. mpan But IF we want a single encoding AND Unicode AND fixed-width, then UTF-32 is the way and there is really no discussion for me about memory usage. Texts rarely are the main content of the program.
  276. Link Mauve Just look at the implementation of str in Python starting 3.3.
  277. mpan One knows if its Shift-JIS or cp932 simply because one implements it :P
  278. Link Mauve In a chat program, text is definitely the main content.
  279. mpan Yes, but not much of it.
  280. Link Mauve No, you have to interact with the rest of the world.
  281. Link Mauve This is the main pain point with not using Unicode.
  282. mpan No, the concrete implementation never leaves a single computer. Network protocols are another issue.
  283. Link Mauve A program that doesn’t communicate with the outside, be it the user’s filesystem or the network, is usually quite useless.
  284. mpan And as for chat. Where does Gajim hold chat history on Linux?
  285. Link Mauve In a sqlite database, in $XDG_DATA_HOME/gajim/logs.db
  286. euan Just out of curiosity, whe I asked earlier what package installed Unicode, I gues I meant what C library provieds support for it, surely not every C application implements it itself?
  287. Link Mauve euan, traditionally it’s the libc.
  288. euan is it in the standard lib glibc?
  289. euan OK, thanks
  290. Link Mauve Yes, or icu, or one of the various implementations.
  291. euan OK, but UTF-32 must be well baked in a long time ago right?
  292. euan $ ls -1 /var/log/packages/ | grep icu icu4c-51.2-x86_64-1 icu4c-compat32-51.2-x86_64-1compat32
  293. mpan euan: but note that libc itself has no direct support for Unicode. More like general concept of fixed-width vs multicharacter strings, and then again there are locale descriptions that may be used for translation, character type classification (ctype.h) and so on.
  294. euan OK, thanks, when I develop I don't usually deal with UTF/Unicode, just let the runtime handle it
  295. mpan Link Mauve: my logs.db is 41M. Don’t know if it’s UTF-32 or UTF-8. Even if it would be 10× bigger than still loading whole history into a memory of modern computer is nothing. And no one does this, as the operation would make no sense.
  296. euan python does it all for you, I never even wondered which UTF it uses
  297. mpan Link Mauve: daily backlogs are probably in range of few hundred K with UTF-32. Literally nothing to care about.
  298. Link Mauve mpan, it doesn’t matter which amount of logs you have, UTF-32 has various downsides like thrashing caches, requiring compression, etc.
  299. Link Mauve But I don’t know either what sqlite uses to store its database.
  300. Link Mauve Looks like UTF-8.
  301. mpan good for them ^_^
  302. mpan For me the main downside of UTF-32 is that it’s a fixed-width encoding. But given one wants such with Unicode, I believe other costs are acceptable. Not that **I** believe one should do it, but if **someone else** wants…
  303. Link Mauve You mean you are fine with len('☺') == 3? :/
  304. mpan 4
  305. Link Mauve Or 4, I don’t remember the details.
  306. mpan for UTF-32 it’s always 4
  307. Link Mauve 2 actually.
  308. mpan UTF-32
  309. mpan UTF-32 is always 4
  310. euan UTF-32: 2 to power of 32 = 4,294,967,296. Is that really necessary!!
  311. Link Mauve Except Python chose to expose codepoints.
  312. Link Mauve So you will always get the actual number of codepoints in your string.
  313. mpan Link Mauve, I like the idea of adaptive encoding Python 3.3 has choosen and you have just mentioned.
  314. Link Mauve It’s a good choice imo, and it prevents you from splitting a string at a wrong position like so many languages.
  315. mpan euan: if you want unicode support AND fixed-width encoding? I see no other way :D
  316. Link Mauve euan, actually you can’t go that high, due to UTF-16 stupidness you are limited to U+10FFFF as the maximum codepoint.
  317. mpan euan: but, as I’ve repeated already multiple times, I don’t think fixed-width encodings make sense.
  318. euan I mean I know there are a lot of languages with large cahrtacter maps, but are we really needing more that 16bits?
  319. euan so do you agree with bnuilding python2 with ucs-4?
  320. Link Mauve euan, there is no other way, if you want e.g. emoji support.
  321. mpan Link Mauve, well… in Unicode you can’t go over U+10FFFF anyway?
  322. Link Mauve mpan, yes, due to UTF-16.
  323. euan yes, I guess so, I understand, just didn't realize it was a compile option
  324. mpan Well… UTF-16 is a problem as it looks like fixed-width, but in reality it’s variable-width and beginners often got caught by this.
  325. euan trying to figure out why Slackware has gone with ucs-2
  326. Link Mauve euan, likely because it’s the default.
  327. Link Mauve (Supid default.)
  328. euan what i don't understand is what you mean by fixed-width, aren't UTF-8 and UTF-16 also fixed-width?
  329. mpan No
  330. mpan They’re both variable-wdith
  331. mpan A single character may take various number of bits
  332. euan isn't it either 8,116 or 32 bits
  333. Link Mauve euan, the number of bytes which makes a codepoint is different from one codepoint to another.
  334. Link Mauve That’s what it means.
  335. euan correction: 8,16.32
  336. euan codepoint? I'm lost, guess I don't kow much about unicode
  337. Link Mauve euan, basically a “character”.
  338. euan to you mean with UTF-32, you have to pad with zeros, but not with others
  339. mpan UTF-7: 1–8 octets UTF-8: 1–6 octets UTF-16: 2–4 octets
  340. euan so with variable-width, you just stop after the maningful bits are given?
  341. mpan and UTF-32, which is a fixed-width encoding, always takes 4 octets.
  342. Link Mauve mpan, actually no, a codepoint can’t be bigger than four bytes in UTF-8.
  343. Link Mauve euan, it’s explained here how UTF-8 works: https://en.wikipedia.org/wiki/UTF-8
  344. mpan asks Wikipedia
  345. mpan Oh, indeed. Sorry
  346. Link Mauve And here for UTF-16: https://en.wikipedia.org/wiki/UTF-16
  347. mpan So: UTF-7: 1–8 octets UTF-8: 1–4 octets UTF-16: 2–4 octets
  348. euan OK, i'll read up on it
  349. euan thanks
  350. Link Mauve mpan, never seen UTF-7 used in the wild.
  351. mpan Link Mauve, e-mail systems?
  352. mpan quoted printable or base64 isn’t standard everywhere
  353. Link Mauve Sad. :/
  354. mathieui "legacy"
  355. euan never heard of UTF-7
  356. mpan euan: a 7-bit, ASCII-compatible encoding for Unicode
  357. euan so with UTF-8 the binary can be differnt lengths'
  358. euan but how is it terminated
  359. euan how does the interpreter know when one caharcter ends and the other starts?
  360. euan in ASCII, it just one byte, so not an issue
  361. mpan Despite ASCII is practically dead for 20 years and nearly no one uses it out of technical reasons, some people believe they should stay compatible with machines that no longer exist anywhere, so they stick to 7-bit encodings. Even there is probably no single 7-bit machine running anywhere in the world in 2016.
  362. mpan … and hence UTF-7 was created.
  363. Link Mauve Haha.
  364. Link Mauve You wish.
  365. mpan Note that limiting oneself to ASCII character set (not encoding itself!) makes a bit of sense, as it’s the character set that will most probably be supported by any machine. But it does apply to cases when data has to be interpreted, not just blindly sent through a medium.
  366. euan I ask again, how does the parser know when a character ends, if it is variable length, are there termination bits
  367. Link Mauve Hello EBCDIC.
  368. mpan euan: based on the previous sequence of bits.
  369. Link Mauve euan, no, there is a beginning bit though.
  370. euan aha the U part?
  371. Link Mauve So if you start the stream en route, you can ignore all until the next character.
  372. mpan euan: also, even if it doesn’t know the previous bits, UTF-8 has a property called “self-synchronization”. it can always detect the first octet.
  373. euan how is taht done in binary?
  374. Link Mauve euan, look at https://en.wikipedia.org/wiki/UTF-8#Description
  375. euan OK, sorry, being lazy
  376. mpan Link Mauve: and also encodings used for 8-segment displays in microcontrollers :P. Probably more popular than EBCDIC in modern systems ;)
  377. euan I'll read it fully when I get more time
  378. mpan euan: in short: the first octet contains a sequence that can’t appear anywhere else.
  379. euan OK, that makes sense
  380. mpan The first octet never starts with 0b10…, while others always do.
  381. euan but now I don't understand why we need fixed width, shouldn't ther just be one format that goes up to 32bits, thus covering all possible characters / emoticons but not needing padding bits?
  382. mpan euan: because accessing random characters in a string is fast in fixed-width encodings
  383. mpan … and some people believe they need random accesses in strings.
  384. euan OK, so no ned to parse, I get it
  385. euan yes, makes sense
  386. euan but having so many types of unicode kinda sucks
  387. mpan No, Unicode is one. Just encodings are different.
  388. mpan And it’s nothing compared to the number of encodings other character sets have.
  389. euan that how it goes in software I guess
  390. euan OK, so I meant so many types of UTF (encodings)"
  391. euan other character sets?
  392. euan I though it was just ASCII or Unicode
  393. euan ehat else is there?
  394. euan what^
  395. euan oh yes, the ISO's
  396. euan so is the idea that Unicode replaces those? they should be dead by now
  397. Link Mauve euan, it’s trivial to convert from one Unicode encoding to another, just decode it to a sequence of codepoints first and encode it to another sequence of bytes.
  398. euan but I don't mind ASCII living on, it's always going to be needed somewhere
  399. Link Mauve UTF means Unicode Transformation Formats.
  400. Link Mauve It’s just one possible encoding, which is standard.
  401. mpan euan: Let’s just point out, that for character sets that contain polish characters, there are: iso-8859-2, cp1250, cp852, cp10029, Mazovia, MazoviaII.
  402. euan but that's my ignorant latin-language-speaking (English) self tallking
  403. mpan (and these are only that were used quite recently for PCs)
  404. Link Mauve euan, there are tons of legacy encoding, each incompatible with each other, which is why you should always use Unicode.
  405. mpan (if we include all, from all platforms, there will be probably like 20–30 encodings just for polish)
  406. euan agreed, Unicode FTW, others die
  407. euan now, which Unicode shoiuld we use again, LOL
  408. mpan There is only one Unicode, just different encodings.
  409. euan appologies, whilst I catch up with you guys LOL
  410. euan that's what I meant, unicode encoding
  411. euan anyways, it's been a real lesson, and a pleasure
  412. Link Mauve euan, whichever you want, to solve your problem at hand.
  413. mpan euan: “Unicode” — a set of characters. “encoding” — a particular way of encoding a codepoint (“character”) in a computer system to store/transfer it.
  414. Link Mauve There is no good answer, and it’s trivial to switch if you need some other representation for some specific problem.
  415. mpan For me (and this is subjective), if there is no other factors, the default should be UTF-8.
  416. mpan Works well enough in nearly all cases.
  417. euan previously, I just thought they were differnt ages of maturity, as in somebody went, "oops, there's still not enough room here, let's make it bigger"
  418. mpan (given the programmer isn’t an idiot)
  419. Link Mauve euan, not really, they are good at different problems.
  420. euan well, I've done plenty programming, but I never need to worry about UTF encodings
  421. euan Java and Python do that for me
  422. Link Mauve For example, if you have enough memory and want some operations to be fast, like counting the number of characters or cutting a string after N characters, you use UTF-32.
  423. Link Mauve If you are stuck with Java, .NET or JavaScript, you have no choice but to use UTF-16.
  424. mpan euan: actually Java uses UTF-16, whih has that ugly trap that it looks like a fixed-width one, but isn’t. Nearly all codes that work on Java strings do it wrong.
  425. euan and my embedded C projects use ASCII, no need to talk to people, at least only English speaking via LCD
  426. Link Mauve If you want the generally-most efficient storage way, at the expense of computations (so great for storage), use UTF-8.
  427. mpan I would not agree with Link Mauve. UTF-32 doesn’t make calculating length faster. And UTF-8 isn’t causing real slowdown.
  428. euan well, it must require parsing though
  429. mpan Most text operations are related to parsing…
  430. mpan It’s just one more level of parser.
  431. euan which has some cycles over iterating an array
  432. Link Mauve mpan, well, tell me how many characters there are in “68 65 cc 81” (warning, there is a trap).
  433. mpan And if you’re talking about calculating length, you mean c-string probably. Since UTF-8 is ASCII-compatible, one may use fast methods for finding octet 0x00, which are even faster than comparing to U+0000 in UTF-32.
  434. euan depends on the hardware I guess, or how much you care about cycles
  435. Link Mauve mpan, well, tell me how many codepoints there are in “68 65 cc 81” (warning, there is a trap).
  436. mpan And 0x00 will not occur in UTF-8 anywhere except U+0000.
  437. euan OK, back to my SlackBuild
  438. Link Mauve mpan, you can’t do that if you want to count the length of the string, here you are counting how many bytes it contains.
  439. euan I have it set up now
  440. Link Mauve While this is independent of the representation.
  441. euan butthe configure args don't mention ucs-2
  442. mpan Link Mauve: ok, my fault. Thought you’re talking about byte-length, not character-length. You have a point here.
  443. Link Mauve mpan, this is especially important in protocols which have a limit.
  444. Link Mauve The most well-known example being Twitter and its 140 characters limit.
  445. Link Mauve You really don’t want to cut in the middle of a codepoint.
  446. Link Mauve You also don’t want to disadvantage people speaking Japanese over English ones.
  447. mpan But, again, with both you have to do it at O(n). It’s just a matter of constant factor, which IMHO isn’t that big between UTF-8 and UTF-32.
  448. Link Mauve Nope, with UTF-32 you can count the number of bytes and divide that by four.
  449. mpan First you have to count the number of bytes.
  450. Link Mauve Heck, you even usually know the number of bytes, e.g. in the Content-Type.
  451. Link Mauve Heck, you even usually know the number of bytes, e.g. in the Content-Length.
  452. euan there's not configure flag for --enable-unicode
  453. euan so basically the python2 default is ucs-2
  454. mpan I see a great security hole already, if we trust the user that they’ve sent a valid length and blindly cut data at this point :>
  455. euan let me find the Debian source package, and see what it does
  456. Link Mauve mpan, doesn’t work like that, again look at Python’s str implementation.
  457. mpan $ /usr/bin/python2 -m sysconfig | grep -F CONFIG_ARGS CONFIG_ARGS = "'--prefix=/usr' '--enable-shared' '--with-threads' '--enable-ipv6' '--enable-unicode=ucs4' '--with-system-expat' '--with-system-ffi' '--with-dbmliborder=gdbm:ndbm' '--without-ensurepip' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro' 'CPPFLAGS=-D_FORTIFY_SOURCE=2'"
  458. mpan Obviously there is >_>
  459. euan you on Debian?
  460. mpan No, ArchLinux. But we’re talking about Python, not Debian. Distro has nothing to do with it
  461. mpan Link Mauve, I believe you were talking about Twitter now, not Python.
  462. euan well the distro packager chooses the configure options
  463. mpan euan: but now you’re the one who builds the package, so you choose the options
  464. Link Mauve mpan, both have the exact same problem, which I believe they are solving the exact same way.
  465. euan what i'm stying is, unless you specify --enable-unicode, it's ucs-2
  466. Link Mauve Which is: make len(str) fast.
  467. mpan euan: well… you have said **there is no --enable-unicode**.
  468. mpan [2016-04-09 21:16] ‎euan‎ there's not configure flag for --enable-unicode
  469. euan I know, but I'm just curious what other distros use,an dwhy it's the default
  470. euan shouldn't ucs-4 be the default for python
  471. Link Mauve euan, it is, starting 3.0.
  472. Link Mauve Starting 3.3 it isn’t even possible to build on UCS-2 anymore.
  473. euan and of course, I shouldn't have to build my own python, it's part of Slackware base system and it's great that I can, but sucks that I have to
  474. Link Mauve euan, complain to your distribution, otherwise it won’t change.
  475. euan OK, so basically python2 is old and decrepid, I can't wait until everything is python3 myself
  476. mpan I wonder… why in the night there is so little activity over IRC? It’s a worldwide service, so except the times when day is sweeping over the Atlantic (and this is a short period), there should be no difference.
  477. euan but I still need to port my own apps over ;-)
  478. euan I guess most folks are in one part of the worlds, I'm in Singapore, it's 3:24AM, I'm a vampire
  479. mpan But there is a day somewhere, always.
  480. euan most folks on the IRC I meant
  481. euan which IRC channel are you talking about?
  482. euan this is an XMPP conference, isn't it?
  483. euan unless it's a gateway for IRC??
  484. mpan euan: this one is XMPP, but I’m also talking over IRC (on freenode right now). And rooms are getting inactive during night.
  485. vorner mpan: Well, it's starting to be night here in Europe. And America may be shy to use IRC, with all their NSA spying on them, or something.
  486. euan weird, I guess they are concentrated in ceratain timezones, but you're tright should be evenly spread
  487. euan even between Europe and Americas, it's a decent spread
  488. vorner Most of the people would probably come from USA or Europe. Other parts of world has more problems with English language.
  489. euan true
  490. euan and not so much open source in Asia, everyone uses Whatsapp or similar here in Asia
  491. euan of course that's a gross generalisation
  492. mpan vorner: 1) America doesn’t care about NSA, really. It’s just some freedom folks that do, rest is completly oblivious to the issue. I believe one state in Germany has more people concerned about NSA than whole USA. 2) SSL encryption works well with freenode. :>
  493. mpan Indians speak english pretty well, better than most Europeans do.
  494. mpan Actually in India and most of the Africa, english is the official language (unlike in most european countries)
  495. vorner Hmm. Yes, I forgot that. Allright, my explanation isn't the best guess.
  496. euan mpan: not sure I agree, educated ones, that travel, yes, but not majority living in India
  497. mpan mpan: uneducated don’t use IRC not only in India :>
  498. euan true, point taken
  499. mpan mpan: but you would be surprised — english is taught in even the most shanty school in India.
  500. euan I gues they have there own channels
  501. euan are your IRC rooms open source / linux focussed?
  502. mpan Oh… you may have hit the point!
  503. mpan Yes, this might be the reason.
  504. euan Open source is not big anywhere, but I llive in Asia and you mention Linux in an IT Mall and you get strange looks, event the IT guys think it;s majic that falls from the sky
  505. mpan Yes, I know that. This is why I’ve said this may be the point.
  506. euan It's Microsoft and Apple and Android only here
  507. mpan I‘m also on two Java hannels, but I’ve realized that India is so heavily flooded by Microsoft, they may simply have less interest in other solutions too.
  508. euan but I haven't lived in Europe for a whilre, so my opinion is probably scewed
  509. mpan No, it isn’t.
  510. mpan euan, btw, Android is Linux ;)
  511. euan India is actually one where opens ource and lInux is getting big fast
  512. euan but not here ion South East Asia
  513. euan Korea and Japan are also more tlike Europe/America trend wise
  514. euan But actually South America is probably where Linux is biggest
  515. euan Well, it is the Kernel, but not what I call Linux
  516. euan And I do tell them it's based on Linux, but do they care? The asian mentally is to want the expensive, luxurious brands, but pay as little as possible for it.
  517. euan Sorry, don't mean to dis on them, it's the same everywhere to a point
  518. mpan It’s because Asians, at least in Japan and Korea, seem to be very concerned about showing off the status of their family, and the easiest way to do it is by exposing material wealth (even if it’s half-fake). Isn’t it like that?
  519. mpan Linux appears like “operating system for beggars” as it’s gratis.
  520. euan yes, true, but changing in Korea/ Japan, but you are right to other Asian countries
  521. euan exactly
  522. mpan Unfortunely I know little about people in other south-east asian countries, so I can base my opinion only on Japan and Korea :)
  523. mpan So thanks for confirming, as a local.
  524. mpan Well… I’m also interested in south chinese, primarily offshore chinese from the pacific rim. But that’s a completly different topic (and extremely hard to explore).
  525. euan Well, Korea and Japan are much more akin to the west than the rest of Asia, certainly South East Asia
  526. mpan And information on them is much more easily accessible to the westeners, like me.
  527. euan India is very class based and the upper classes are becoming quite Western, or shall we say, globally minded
  528. euan I suppose it actually correlates to wealth, now that I think about it
  529. euan so the richer countries use more gratis software culturally, LOL
  530. euan the great Mad Dog travels to Vietnam spreading free software, and they tell him "but Mad Dog, all our software is free"
  531. euan and Microsoft likes piracy in those countries, keeps the communist "cancer spreading" opens source community away ;-)
  532. mpan It’s actually not that surprising. The more person gets money, the more they have time to think about more abstract subjects, like — for example — freedom.
  533. euan true, true
  534. Kergma euan, sorry for offtopic -- are there some chinese jabber conferences with people?
  535. mpan Since we heavily offtopic already, I doubt you have to feel sorry about it ;)
  536. euan I suppose so, but jabber not as popular here, QQ is huge in China
  537. mpan And I hope ops have nothing against until people, who seek help with Gajim, come :D
  538. euan they have there own propritary service providers
  539. mpan In China also GG is common, surprisingly.
  540. euan Facebook is huge everywhere
  541. mpan (Gadu-Gadu)
  542. euan Google banned in China, not sure about Facebook
  543. mpan It’s surprising, considering that Gadu-Gadu was intended for Poland.
  544. vorner I thought GG was dead like 10 years ago already.
  545. euan never even heard of GG
  546. euan it's quite rediculous though isn't it, all these different chat protocols
  547. euan but trying to get folk on XMPP almost impossible
  548. euan people want brands!!
  549. euan @Kergma,: sorry, I'm not really sure but here in SEA I've not met one person that even knows what Jabber/XMPP is
  550. euan Whatsapp is huge here ATM
  551. Kergma ok. I see
  552. Kergma thanks
  553. euan but to be fair, we are late in shipping features like message carbons and message archiving, proprietary protocol clients had those nailed long ago
  554. mpan I wonder: do you know any XMPP MUCs where one may freely talk about cultures?
  555. mpan Because now, here, we’re hijacking the room a bit :)
  556. euan and decent file transfer support, http_upload is great, I could have done with it years ago.
  557. euan yes
  558. euan apologise
  559. euan I was done, then Kergma asked a question
  560. euan >Link Mauve‎: "euan, complain to your distribution, otherwise it won’t change." I shall do, but if ucs-2 is as bad as you make it out to be, iit's hard to beleive this has gone unnoticed in Slackware. There must be some reason for it.
  561. Link Mauve euan, I doubt so.
  562. euan and is there any side effects to changing to enable-unicode=ucs4, will it break my Slackeware installation?
  563. mpan euan: just use it only with Gajim
  564. euan what about all the other python software, I've never run into any Unicode issues with other software
  565. euan eh? you mean install a seperate python just for gajim?
  566. Link Mauve Have you tried to use emoji anywhere in other python2 software?
  567. mpan Yes
  568. mpan You have no choice actually. The global one isn’t managed by you, so you can’t override it without breaking things.
  569. euan no, i've not used emoji enywher eelse
  570. mpan The one in /usr/bin is managed by your distro
  571. mpan Yours will be in /usr/local/bin
  572. euan yes, but i can replace it
  573. mpan You shouldn’t.
  574. euan only I don't know if it will break anything
  575. mpan Yours should go to /usr/local
  576. mpan Yes. Most probably the next system upgrade… :P
  577. mpan Either the upgrade will break, or the upgrade will break your own, customized version.
  578. euan OK, so whith Python, there no linking, so why can't I replace it
  579. euan is the other software expecting ucs-2?
  580. euan if so, how does it know?
  581. mpan Dunno. But the rule of thumb, as Linux filesystem hierarchy specifies, is that customized packages always go to /usr/local
  582. mpan /usr is managed by the distro, /usr/local — by you
  583. euan or this is an issue caused by the plugin expecting python3?
  584. mpan Doesn’t matter. You just don’t put customized things in /usr
  585. vorner euan: There's linking. Bunch of apps use python embedded into C/C++ and there are many python modules written in C/C++. These are likely to break badly.
  586. euan I don't upgrade really, Slackware only pushes security updates
  587. euan and I can manage that side of things myself
  588. euan ther has never been an update for python2 AFAIK
  589. euan I will not upgrade to 14.2, fresh install only
  590. euan thanks vorner, so ucs-2/ucs-4 is the kind of think that a c module may link against?
  591. euan I have a custom hacked versiopn of kdelibs, my own patch
  592. vorner I'm not 100% sure, but I think the string gets passed through the API and when the parts don't agree on the format, it could go bad.
  593. euan to fix an issue with KIO slaves and davfs2 and mime majick inspection of large files caauding very slow browsing in Dolphin
  594. mpan euan: doesn’t matter. You don’t put custom things into /usr. Period. Of course you can, but then don’t ask if it will or willn’t break things — you will intentionally do something that may break things.
  595. euan so what i do is name the package version (revision) -10, so that -2 or -3 or -4 pushes by distro repo don't clobber it
  596. euan welcome to the world of making a corporate rollout, very different from a home user mahcine(s)
  597. euan I need to patch things myself, can;t just switch distro or track "testing"
  598. euan but installing a seperate versiojn in /usr/local is a good idea
  599. euan that wouldn;t work for kdelibs though, way to much work rebuilding the rest of the KDE stack
  600. mpan euan, “versiojn”? Ĉu?
  601. mpan Well… /usr/local is the location they should go. This is why it exists in the first place. Even ArchLinux, which dropped support for /bin and /sbin, still doesn’t even think about abandoning /usr/local.
  602. euan I've posted on the Slackware forum, asking why it's not ucs-4
  603. euan oops, I should have checked existing posts first: http://www.linuxquestions.org/questions/slackware-14/%5Brequest%5D-python-with-unicode4-support-4175473344/
  604. mpan ‘k, let me brew some tea.
  605. euan how complete / stable is the latest (python3 apable) version of Gajim?
  606. euan ^capable
  607. euan OK, so I've replaced my python with my own built with ucs-4
  608. euan I'll reboot and see what breaks
  609. euan I wonder what apps I use that use python
  610. euan I know the KDE CUPS config uses python
  611. euan any recommendations what I should test for breakage
  612. euan I can easily switch back to the distro verison if anything breaks, it's worth a try
  613. euan OK, I'm back after a Gajim restart
  614. euan everything OK so far
  615. euan strange, I still get: python2 -m sysconfig | grep Py_UNICODE_SIZE Py_UNICODE_SIZE = "2"
  616. euan silly me, I forgot a backslash on my configure line
  617. mpan is back
  618. Link Mauve euan, it’s not stable, stuff like voice/video or serverless XMPP don’t work yet.