Friday, December 14, 2012

Flex hell

A lot of time passed since our previous post.
It doesn`t mean that we haven`t found any serious problems. We rather haven`t found enough time to describe them here.

But on this week we meet something special.

The beginning was traditional - new arKItect release was delivered. This time - with incredible Flex LM licensing protection. At first all was normal - all test accounts were workable, only allowed users could log on and so on. On the second day we found that couple of internal KI accounts were disabled by Flex LM.
The first answer was: let's make an individual license for them.
Wrong answer.
Any complicated problem has a lot of simple wrong solutions.
They were simply missed from general list.
After rereading skype discussion for a last year we found that some accounts had been lost from initial allowed list from beginning. Not too complicated. Just update the initial allowed list and come to other problems.
After a couple of days we met this problem again!
Assumption way:
1. Looks like somebody replaces our allowed list by mistake - NO
2. May be internal program logic for supporting Flex LM is wrong - NO
3. May be External interfaces for this program are wrongly used - NO
no ways
Read short logs - nothing
Read long logs - nothing except exact place where allowed list was truncated 

Some developers look on each other as on Black boxes - it doesn`t simplify understanding.
4. Lets add more logs! At least it can help in future - OK. At least next step
Reread logs - looks like the problem is in FLex LM supporting program internally - no bad requests from outside in log.
Thinking... thinking... imagination of data flows in program... reading code... thinking...
Finally found (logically):
Due to the lack of protection from concurrent processes the Flex LM supporting program were breaking main license file in simultaneous reading/writing process.
Reproduced in real life on main server!
Reproduced on test server!!!!
Not reproduced after a fix!
It was impressively hard to find due to logical levels, and 3 minutes to fix.

Wednesday, September 12, 2012

Local arKItect hell


Lets continue this blog with ... still crashing Local arKItect version!
This blog is not supposed to be used for local arKItect only. We hope to store other issues there ... later.
So, the beta version was almost finished ... but ... we found ... on new Tuesday that arKItect local version ... not crashes ... but simply shows '??' signs instead of non-latin letters. That was bug looking similar to one reported by Sylvain B on last Friday. Well, it was simply fixed and we started to test arKItect:

1. Create an architecture with russian name
2. Create a rule with russian name.
3. Create a treeview with russian name.
4. Create an object with russian name.
-- so far so good. May be stop testing?
5. Create another object with russian name .. crash!
6. Trying to reproduce ... crash on adding object.
7. Good, it is reproducible. Reopen architecture ... crash on opening!
and then we can`t even open it.

We still have a little hope - hey "guru", may be it is your crash?  ... looks like no.

Lets start investigation. All developers involved ... brain storming still gives nothing.
It the late evening we decided that there are some ghosts in arKItect sources and we can`t get them right now. Lets continue it in the morning...

Morning results:
Server side sometimes (especially for non-English letters) sends less information than needed (messages truncated), but ...
client side usually reads some more information than server gives (getting some memory garbage).
looks like those two bugs together gives arKItect version which can work for a some time without crashes.
Fixed.

Friday, September 7, 2012

Local arKItect ... crashes

We continue publishing our developing stories with this blog.

Lets continue with ... Local arKItect version. The beta version was almost finished ... but ... we found ... still on Tuesday! that arKItect local version crashes on start too often.

There was no specific reason for this.
I remember, that during testing I've check arKItect for all pyark tests.. so there were many launches ... and no crashes.

I've start to torture our server-dev-guru - what did you add to arKItect? The answer -nothing special.

We found, that in some situation it crashes more often .. but still the behavior is not stable.

Well, I've constructed a big weapon: arKItect.exe + debugging information (pdb file) + windbg tool switched to 'step by step' execution mode for given application (I've never tried it before).

And in detailed execution information, before the crash I found something about unloading flexlm management library... the ball passed to our great master, Jerome Mascunan.

Together we start to trying different versions of arKItect + flexlm management libraries compiled by Jerome on my computer. We started with 3 binaries and finish with 0 (finally libraries are sent in lib mode to be used during compilation). arKitect stops crashing on late Wednesday ->  Thursday morning.

It gave same headache to both of us and finally fixed. Just yet another one successful story.

Andrew

Local arKItect NetworkStream pitfall

Here is the blog of KI RUS developing news.
We will describe there our complicated problems, with solutions (or without).

Lets begin with Local arKItect version.
The beta version was almost finished by our "server-dev-guru", and already delivered for trying but ...
we found on Tuesday that it didn't work on Konstantin`s computer. It works on Paul`s, Dmitry`s Andrew`s and virtual ones but not on Konstantin`s...

We constructed a detailed log for both server and client side ... but it didn't show exact problem, only gave the http error code #12031 in WinInet.

After deep investigation we found, that the 'server side' for local version is not good enough, and updated it to  be equal to MSDN example... but the problem still remained, in a new form.

Then we setup the stream sniffer (SocketSniff, the standard one Microsoft Network monitor didn't help!) and looked on the problem from another side .. but it still didn't help.

After surfing the Internet, in the MSDN article (http://msdn.microsoft.com/en-us/library/system.net.sockets.networkstream.read(v=vs.90).aspx), in small community comment (existing only in one version of this article) we found a small key... on Thursday. Not everyone agreed with proposed solution, it was kind of internal battle. But finally it helped.

All four of us was involved to investigation during those terrible days (about two full day for each one).
Fortunately the solution was found before we got completely exhausted, and we got new practial knowledge (as usual).