Friday, January 8, 2016

Python For Loop Fun

I had a problem recently where I needed to loop through a for loop to find some lines then only combine some lines so I thought for i in range(len(list)): ought to do it. It did not, I'm here to tell you the sad story of what happened. (Well actually this is a conversation I replied to with my buddy Phil, who should start a blog, cause he's awesome and probably could teach you many more things than I could).

Actually there is duplication but I think you forgot to include 'shut your dirty mouth' in the l array anyway... ;)
I added a "blah" to the 3rd element (0 based) so you can see when it's being printed vs the 2nd element.
Phils Function:


l = ['blah', 'blah blah', 'blah blah blah', 'blah blah blah blah']
for i in range(len(l)-1)):
   print l[i], l[i+1]

Phils Output
0blah blah blah
1
1blah blah blah blah blah
2
2blah blah blah blah blah blah blah
3
Tysons Function:


q = range(len(l)).__iter__()
for i in q:
   print l[i], l[i+1]
   q.next()

Tysons Output
0blah blah blah
2blah blah blah blah blah blah blah

Note the difference in the number of blahs and the number of lines
Phils approach prints current line plus next line through the entire list (and skips the last element due to the -1) Also note that even though we did a +=1 the next time through the loop the number is back is back....
Mine skip lines and print the current line and the next
Now this is obviously a contrived example as you can do range(0,len(l),2) and I think get the same result, but in my particular case I only wanted to combine certain lines, so only if line + 1 (or l[i+1] contained X would I want to print l[i] and l[i+1] afterwhich I wouldn't want to print the "new" l[i] (which was the same as the old i+1).
I hope that makes some more sense.
Here is a (only slightly) less contrived example.

My new function

l = ['one', 'my', 'buckle', 'my', 'shoe']

q = range(len(l)).__iter__()
for i in q:
    if 'my' in l[i]:
        if 'buckle' in l[i+1]:
            print str(i) + l[i], l[i+1]
            q.next()
        else:
            print l[i]
    else:
        print l[i]

My New Output
one
1my buckle
my
shoe

See now I only want to combine my buckle on the same line but not my shoe
So long story short, I can't just i+= 1 in my loop, I have to do a 'next()' but I can't just do that on an int (i.next() won't work in the above context) but I can do a next() on the range business (if it's got the iter, which I'm not sure about), however you need a variable to fiddle with to do that.


Anywho, next time you need to do the same thing as a simple c style for loop with numbers and want to increment remember.... it won't work like you expect it to.


Wednesday, January 6, 2016

It's not You, it's Me


Clearly that has to be the problem. I mean millions of people use computers every day and have little to no problems...right?  I must be doing something wrong.

I'll tell you my pathetic story.

So I wanted to combine some content from an excel spreadsheet into a word document as comments, so first approach was to just extract the text, add comments in a fancy "html" page. Did this, but tables and the like didn't export to plain text well. So I thought I could come up with something better.
Racoon accidentally dissolves cotton candy in water

Next shot, let's try VBA and just extract some comments from a CSV to insert in a Word Doc. (Seems like it should be straightforward, open CSV, grab data in columns 1 and 2, search in word doc for data in column 1, add comment from column 2). Nope Couldn't figure out how on earth to extract column data from a Excel application within a Word macro (using VBA).

Next, well, Word is a Zip file, lemme try unzipping the file and seeing how that works. Tried adding a single comment, Saved. Clearly they're not going to make this easy for me. Comments are in a comments.xml file with some unique identifiers and then in the document.xml file is the.... you guessed it, document, with the comment reference. This is do-able, but painful, so I thought I'd avoid this for the time being. (Oh by the way, if you haven't looked at the Office Open XML yea it's roughly 7k pages... AINT NO BODY GOT TIME FOR THAT)

Oh wait I nearly forgot, I tried adding a comment, very blindly, that failed, so I tried just unzipping the contents with python, tossed the output into a folder and rezipped, then renamed to .docx, yea word thinks the file is corrupted, no clue what the appropriate "re-zipping" is but good luck figuring that out too.

Next I tried out Mammoth something that attempts to convert a docx to a passable html file using python.... yea tried printing the resulting "HTML" and get unicode errors. Tried a few google searches, and approaches, no luck. Finally decided on a loop through all characters, then doing a Try Catch when printing to command line (Stderr), and storing "successful" prints to an array, for later printing. otherwise passing the "exception" catch. Only problem with a large doc, it takes FOREVER to print all to the command line (due to delays from printing to the command line, in this case not file IO). Oh Wait..... mammoth died, about 2 hours later..... yay for me.

It seems like anytime I need to fight with a Word doc to extract data, I can't seem to find the right incantation in google to solve my problem. Case in point again, find document line number of a comment, Not the section offset but the ABSOLUTE LINENUMBER OF THE DOCUMENT. Good luck. I dare you. I flipping dare you to try. If it takes you less than 8 hours before you give up you haven't tried hard enough, if you've been going on more than 8 hours.....Good luck, but you'll likely miss out on the rest of your life if you keep going.

You see, you can't just go through a word document line by line, you have to go section by section, but then lines aren't really lines in there, they're paragraphs, so how can you ever figure out what line in a paragraph something is? You can't you have to somehow figure out how many characters are on a line and possibly have to do some kind of mod operation ON EVERY PARAGRAPH to get EACH PARAGRAPH linecount and then FINALLY you can sum them up.... but oh wait, you have to do that for all preceeding sections. I won't get into the pain too much more, suffice to say, I finally finished with something that just gives you a section number and the line of the paragraph on that PAGE. No flipping clue how you can even figure that out, but it was the first solution I found and I finally had to give up and use that.

I probably could come up with a dozen horror stories of word if I was really pressed to it, but I'm fairly certain my PTSD would attack and I'd just blog.

Did I mention that 2015 just wasn't my year for computers? I gotta be honest 2016 isn't looking any better.