sorting - Does Python bytearray use signed integers in the C representation? -


i have written a small cython tool in-place sorting of structures exposing buffer protocol in python. it's work in progress; please forgive mistakes. me learn.

in set of unit tests, working on testing in-place sort across many different kinds of buffer-exposing data structures, each many types of underlying data contained in them. can verify working expected cases, case of bytearray peculiar.

if take granted imported module b in code below performing straightforward heap sort in cython, in-place on bytearray, following code sample shows issue:

in [42]: #numpy array out[42]: array([  9, 148, 115, 208, 243, 197], dtype=uint8)  in [43]: byt = bytearray(a)  in [44]: byt out[44]: bytearray(b'\t\x94s\xd0\xf3\xc5')  in [45]: list(byt) out[45]: [9, 148, 115, 208, 243, 197]  in [46]: byt1 = copy.deepcopy(byt)  in [47]: b.heap_sort(byt1)  in [48]: list(byt1) out[48]: [148, 197, 208, 243, 9, 115]  in [49]: list(bytearray(sorted(byt))) out[49]: [9, 115, 148, 197, 208, 243] 

what can see when using sorted, values iterated , treated python integers purpose of sorting, placed new bytearray.

but in-place sort, in line 47-48 shows bytes being interpreted signed integers, , sorted 2's complement value, putting number >= 128, since negative, towards left.

i can confirm running on whole range 0-255:

in [50]: byt = bytearray(range(0,256))  in [51]: b.heap_sort(byt)  in [52]: list(byt) out[52]:  [128,  129,  130,  131,  132,  133,  134,  135,  136,  137,  138,  139,  140,  141,  142,  143,  144,  145,  146,  147,  148,  149,  150,  151,  152,  153,  154,  155,  156,  157,  158,  159,  160,  161,  162,  163,  164,  165,  166,  167,  168,  169,  170,  171,  172,  173,  174,  175,  176,  177,  178,  179,  180,  181,  182,  183,  184,  185,  186,  187,  188,  189,  190,  191,  192,  193,  194,  195,  196,  197,  198,  199,  200,  201,  202,  203,  204,  205,  206,  207,  208,  209,  210,  211,  212,  213,  214,  215,  216,  217,  218,  219,  220,  221,  222,  223,  224,  225,  226,  227,  228,  229,  230,  231,  232,  233,  234,  235,  236,  237,  238,  239,  240,  241,  242,  243,  244,  245,  246,  247,  248,  249,  250,  251,  252,  253,  254,  255,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10,  11,  12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100,  101,  102,  103,  104,  105,  106,  107,  108,  109,  110,  111,  112,  113,  114,  115,  116,  117,  118,  119,  120,  121,  122,  123,  124,  125,  126,  127] 

i know difficult reproduce. can build linked package cython if want, , import src.buffersort b same sort functions using.

i've tried reading through source code bytearray in objects/bytearrayobject.c, see references long , few calls pyint_fromlong ...

this makes me suspect underlying c-level data of bytearray represented signed integer in c, conversion python int raw bytes means unsigned between 0 , 255 in python. can assume true ... though don't see why python should interpret c long unsigned, unless merely convention bytearray didn't see in code. if so, why wouldn't unsigned integer used on c side well, if bytes treated python unsigned?

if true, should considered "right" result of in-place sort? since "just bytes" either interpretation valid, guess, in python spirit think should 1 way considered standard.

to match output of sorted, sufficient on c side cast values unsigned long when dealing bytearray?

does python bytearray use signed integers in c representation?

it uses chars. whether signed depends on compiler. can see in include/bytearrayobject.h. here's 2.7 version:

/* object layout */ typedef struct {     pyobject_var_head     /* xxx(nnorwitz): should ob_exports py_ssize_t? */     int ob_exports; /* how many buffer exports */     py_ssize_t ob_alloc; /* how many bytes allocated */     char *ob_bytes; } pybytearrayobject; 

and here's 3.5 version:

typedef struct {     pyobject_var_head     py_ssize_t ob_alloc; /* how many bytes allocated in ob_bytes */     char *ob_bytes;      /* physical backing buffer */     char *ob_start;      /* logical start inside ob_bytes */     /* xxx(nnorwitz): should ob_exports py_ssize_t? */     int ob_exports;      /* how many buffer exports */ } pybytearrayobject; 

if true, should considered "right" result of in-place sort?

a python bytearray represents sequence of integers in range 0 <= elem < 256, regardless of whether compiler considers chars signed. should sort sequence of integers in range 0 <= elem < 256, rather sequence of signed chars.

to match output of sorted, sufficient on c side cast values unsigned long when dealing bytearray?

i don't know enough cython correct code change be.


Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

python - pip wont install .WHL files -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -