søndag den 23. december 2007

High automatic test coverage of web applications

Tests in web applications

Features in a web application reside in one of two locations: On the client or on the server.

A unit test of the feature in the serverside codebase can cover numerous errors. If the application is written in PHP, Python or any other language compiled at runtime, the unit test can cover compiler errors. It can also cover logical errors and the error of not meeting the requirements.

A functional test on the client side can also cover the integration test with browsers if the test framework can execute the tests in multiple browsers, otherwise the lowest denominator might be adequate (think IE 6).

frameworks to use

Almost every decent language has a unit testing framework which provide a standard way of defining, executing and reporting tests. Some, like Django's, also handles the fixtures, like creating and deleting a database for each test run. This is very helpful.

For functional tests in the browser engine, there are numerous frameworks. Some record a macro and repeats it, these tools are to be avoided, since they make maintenance of the tests very cumbersome, which may lead to neglect in the project. Instead of dicussing pro's and con's of various frameworks, like loadrunner, selenium etc. I'm simply going for Canoo's webtest, which aggregates HtmlUnit and several other thirdparty frameworks into one very good framework.

It has the advantage that it is defined in XML, which ought to be readable by most with at least some technical interest. Think testers, new developers and maybe even project managers. It also has a very good reporting tool that is generated in static html files. I'll give an example of a webtest further down.

implementing and maintaining the tests

When developing testdriven, it is a given that at least unit tests are being made. In most IDE's for java developers JUnit tests are we easily made and runnable (though sometimes it can be tricky to setup a full test suite running all tests). In Django you simply write a tests.py module at the root of your application*1. The management tool in Django can then invoke all tests py simple running 'manage.py tests' at the root of the project. In java you have to marshal all the tests in an ANT script or reference all test classes from 1 testsuite, leading to tedious work that puts the tests in jeopardy of being neglected.

A canoo webtest is run by an ANT script. 1 script, which I always use is simply called 'allTests.xml'. Separating tests should not be the norm as with JUnit instead you should have to put some effort into excluding certain tests. The exclusions might be performance tests or some other test that is not a candidate for regression testing.

In webtest you may define several macros and reuse them when you like and it has all the functionality of ANT. In the next section I give an example of a test implementation.

testing upload and file conversion

This feature of my project involves multiple units on the serverside and only 1 clientside. Because most of the features serverside are using almost exclusively the Django framework there is no need to cover it, so I can rely on one simple unit test to test this for me, it takes a file and converts it, which is the only part I have implemented myself:


    def test_convert_video(self):
        entity = Entity(name="test video")
        entity.save()
        physical = Physical(content_object=entity)
        
        convert_to_flv(self.avifilename, self.flvfilename, physical)
        result = os.path.exists(self.flvfilename)
        self.assertTrue(result)
        self.assertEqual(self.flvfilename, physical.converted_filename)
        os.remove(self.flvfilename)

There are several more features surrounding this one, like test_grab_thumbnail_from_flv, test_invalid_cleanup_convert_video, test_get_default_thumbnail_filename and many more. This part has to have high coverage because it is a core function of the web application, but I will not go into priority vs. coverage in this entry.

The unit under test in the example is the convert_to_flv. When the test executes a fake database is set up and some test doubles are made. The files used in the test are cleaned up afterwards to ensure that the state of the system is not changed as a result of the test.

On the client side I have this webtest:


<target name="test">
  <webtest name="add different kinds of content">
   <login username="user" password="pass"/>
   <addContent name="picture 1" file="../../JOOLLU_webtest/tests/picture1.png"/>
   <addContent name="picture 2" file="../../JOOLLU_webtest/tests/babe.jpg"/>
   <addContent name="movie 1" file="../../JOOLLU_webtest/tests/plane_lands.avi"/>
   <addContent name="movie 2" file="../../JOOLLU_webtest/tests/preview.mov"/>
   <addContent name="" file="../../JOOLLU_webtest/tests/preview.mov"/>
   <addContent name="plain flv" file="../../JOOLLU_webtest/tests/3.flv"/>
  </webtest>

It covers the feature of uploading files. Other tests rely on this being executed which leads to test dependencies, which is totally fine. If it was a requirement not to have test dependencies I would have tests that were hard to maintain since all tests relying on this one would have to upload content themselves, putting the test at risk of being neglected.

Inspect the test. The tag 'addContent' is one macro I have defined as a part of my test project, and it looks like this:


<macrodef name="addContent" description="add some content to the current user profile">
 <attribute name="name"/>
 <attribute name="file"/>
 
 <sequential>
   <clickLink label="add content"/>
   <setInputField name="name" value="@{name}"/>
   <setFileField name="file" fileName="@{file}"/>
   <clickButton htmlId="add_button"/>
 </sequential>
</macrodef>

The power of this is that I can encapsulate code that is part of implementing the tests separately and thus make the actual test easier to read and maintain. A tester with no programming background would be able to read the test and change the name of the file and the file itself.
Among other tests that rely on this I have contentValidation.xml, invalidContentValidation.xml, verifyCorrectContent.xml and many others. The nice thing is that each subsequent test can be a direct verification of the feature or requirement. This way I can ensure that no test effort is lost because the implementation on the server has changed.

closing remarks

Testing this way allows for full coverage of a web application. The webtest framework can handle almost all browser functionality even Ajax, ehich is due to the HtmlUit project. It has a couple of bugs, it can for example only drag and drop 1 item for each test. But these are minor issues. If my application did not contain flash as part of the implementation I reckon I could achieve 100% coverage in automatic tests. As it is now I don't have much test of requirements for performance, but I think I'm at 80% coverage in total with unit tests and webtest.

To get the high automatic coverage you need to figure out on which side you have to test your feature, client or server. Then implement the unit tests for serverside and clientside tests in a framework like webtest.. and then you can implement the feature itself, knowing that when the bar in the reporting tool reaches green.. you are finished.

links

canoo webtest

Django testing

footnotes
*1 in Django there are multiple applications per project. An application in Django is the equivalent of a plugin in Eclipse RCP or a Web Application Resource in J2EE.

tirsdag den 4. december 2007

Development with Django and NADB

Python has always intrigued me. Occasionally I have kept revisiting the language to be marbled at the simplicity and ease of use. Until recently I did not have a reason to go further with it.

I started implementing the project Joollu(coming soon joollu).

I decided to use the Django framework. Because a) its done in python and b) I heard it was good for content-driven websites.

And boy it's intuitive, being a novice Python-programmer I felt it astonishing that I could implement my own video blog in less than approx. 10 hours. The documentation is equally astounding.

The Django framework has an object relational-mapper, that handles all your db-management for you, it's a very simple way of making a model layer. You only need to configure the things that are specfic for you, making it very rapid for simple features. I thought it might be a bigger task to implement the NADB, which is only based on interfaces, but then I discovered this notion of Generics implemented in Django.

Generics is used to associate a model object with any other object, without knowing anything about it's implementation. This is exactly what the NADB is all about. Instead of having a Video model class, I implemented a Physical(for storing files), viewable(for attributes like thumbnail, width/height etc.) and firstly an Entity class to represent the actual video, the only attribute here is name and id. None of these models know anything about the other on the DB-layer, instead I have a MediaManager, that can marshal a video from all its referred interfaces. Retrieving a video can be done like so:



>>> from joollu.core.models import *
>>> entity = Entity.objects.get(name="myvideo")
>>> physical = Physical.objects.get(content_object=e)
>>> viewable = Viewable.objects.get(content_object=e)

or more simply with a little wrapping



>>> from joollu.core.managers import MediaManager
>>> entity = Entity.objects.get(name="myvideo")
>>> video = MediaManager().filter_video(name="myvideo")



>>> from joollu.core.managers import MediaManager
>>> entity = Entity.objects.get(name="myvideo")
>>> video = MediaManager().get_media(entity.id)

sadly it kinda impairs the possibilities of filtering and retrieving which is done very easily with Django. Here is an example for physical, if I want particular files:



>>> from joollu.core.models import *
>>> physical_list = Physical.objects.filter(file__endswith=".png")
>>> physical_list
[Physical: /Users/kennethnielsen/pythonspace/joollu/media/content/picture1.png]

Keeping the notions of things like video and images in the business layer increases flexibility. If I were to change attributes I can slap it on one of the exisitng interfaces or make a new one. Like if I want to save who and when something has been published, I can make a new model class Publishable and attach it to all content. This would be regardless of it being a video, image or other. It is up to the business layer to decide if published attributes are needed in specific cases. Comments is another example and so it goes..

There are a ton of other stuff worth mentioning about Django, it all just kinda fits together in this very intuitive framework. You should read up on it, or find the tech talk about it on google tech talks.

Outsourcing

With all the J2EE projects being outsourced to India, I guess this zeitgeist graph makes sense:

Google searches for 'What is j2ee'

Look at the regions...

onsdag den 31. oktober 2007

No Artifacts DataBase (NADB)

prologue

Fueled by the Tech Talk 'Everything is miscellanous' and the book about abstract datamodelling I read lately I think it's time I give it some action...

Databases today inherently have trouble adapting to changes in the business, nowadays business logic changes rapidly because of the increasing demand of adaptive business and use of IT in general.

Our databases are not up to speed though. Designing a database often takes too much time and is too great a risk to mess up, so you start of by designing this, and then basing your software on it.

With rapid development, it ought to be the other way around. The business dictates what needs to be in the database and often business changes. Databases should be as adaptable.

Avoid artifacts in the database

Data are not static! The database of a typical company contains a product, order and orderline.

Looking at the product table, what constitutes a Product? That you say it's a product? - no, it has a price and can be bought and therefore is a product.

If you are a paint store your product table could contain the columns: color, viscousity, manufacturer, price. If the paint store at some point wishes to extend it's catalogue to, perhaps, lamps, the table would be extended and we would have a table with the columns: color, viscousity, manufacturer, bulbtype and price.

This would leave some of the rows as null values, since a lamp doesn't have viscousity(Okay bad example.. you get the idea). Maybe you would fill out viscousity anyway or do something awful like using the column for something else.

In a big company I worked for I actually saw a table called 'Ship' which also contained 'Automobiles', because when automobiles became accepted as a business object some time in the past, the table happened to involve some of the same systems and had a property like 'Motor registration number'. Ofcourse, some columns in the table were along the way misused for something completely different then intended.
Actually it wasn't really clear anymore what entites were saved there, but I think also 'Caravans' were in there.

Taking the example of mixing 'Paint' and 'Lamps', lets look at their individual properties:

Paint

viscousity

color

Lamp

bulbtype

Product

price

Notice that I dont go all out here and create a 'Fluid' table for viscousity and a 'Coloured' table for the color. Also I have left out the property manufacturer, because it becomes a table 'Manufactured'. A product could be a service, which is not manufactured.

So you're probably asking, what handles the relations between all these properties. Well, the product should not relate to either 'Paint' or 'Lamp' since it should not define what a product is. Instead the business handles this by instantiating entities in the database with an abstract table 'Instance'. The abstract modelling I have come up with is shown below. The relations are read as "Instance implements Type" and "Type describes Instance". It is like the modelling we know from Object-oriented languages, with the key difference that it only describes states, because we are dealing with a persistence mechanism, that should only handle state. Behaviour is described by the business(READ application on top of DB).

You might be thinking, what about metadata like logging of change of state and other valuble business info. These are very relevant issues and such a log of something should be an instance in its own right. It would have properties like date and user. Where user is a property that is a reference to an instance. So the property user would have a domain allowing it to be an instance, and the value would be a reference to an actual instance of a user. This is the way to do metadata in the model. If you would to refer to a user by a UserId you lock the table in to being a User logged event, by allowing it to be a reference to an arbitrary instance you can define it to be a User logged event in your business by checking the type of the referred instance.

Note that this is not a classical Inheritance hierarchy where types can 'be of' other types, the 'Type' here is equal to an interface(as implemented in Java atleast). The creators of Java have since regretted the class inheritance as we know it today, because of the classical problem of the fragile superclass. The superclass is fragile because it's children are inherently(pun intended) dependant on the implementation of the superclass. When several children exists, the purpose of the superclass becomes more obscured as the business continues to grow and different requirements are put on the children, as the case with a table expanding horizontally. Instead, the children should implement different interfaces, dividing up the different requirements into several descriptors. The implementation of the interfaces can then be delegated. The big difference here is that state is not inherited.

In the case of paint being sold, it would be an instance of paint AND price, collectively becoming what the business would describe as paint we are selling.

The 'Instance' does not contain any properties itself because the 'Instance' is the sum of its relations.

Conclusion

The relational databases today are too statically implemented, by abstracting the data, we can much more easliy build a clean database, without ripping out tables. To change the properties of an instance you only need to relate it to a different property and not do heavy, frightening, undoable 'Alter tables' etc.

The model ofcourse requires some robust business logic on top, but from the recent working with Hibernate and Linq it's apparent that the traditional business logic never really worked. Contraints and validation of data has moved more towards UI and the business layer has become some dumb DAO, hopefully this approach will revitalize the layer and bring more flexibility into our backend.

From here I have to prove my theory by actually implementing a system for it and describe how intuitive and flexible my backend became :)

Stating that something is within a particular domain by putting the instance in a table with a certain name, does not make it so. It is as absurd as arguing whether Pluto is a moon or a planet, because it doesn't fit into either. Pluto is just an instance with a different set of properties.. "Everything is miscellanous".

onsdag den 17. oktober 2007

BEA Weblogic tips

J2EE shared libraries are very useful, because they are merged with your web application you can reuse web resources like JSP's, special framework stuff like BEA controller's and so on.

Here's how you might do it:

creating a shared J2EE library

a library can be any J2EE module. Mind you I've mostly used Portal Web projects as the base for my libraries as I found that I often have web resources in my libraries

defining a shared J2EE library

1. go to the META-INF folder
2. open the MANIFEST.MF file
3. add the following three lines, filling in (example given further down):
4. Extension-Name:
5. Specification-Version:
6. Implementation-Version:

Manifest-Version: 1.0
Extension-Name: common-portlet-template-web
Specification-Version: 0.1
Implementation-Version: 0.1

NOTE: the version MUST be given in order to import the library into BEA Workshop 10.0

exporting a shared J2EE library from a Web project

1. update the versions as commented in 'defining a shared J2EE library'
2. right-click the project, choose 'export->WAR file'

exporting a shared J2EE library from a J2EE Utility project

1. update the versions as commented in 'defining a shared J2EE library'
2. right-click the project, choose 'export->Export', select JAR file, click 'Next'
3. choose your source folders and the META-INF folder for export, click finish

importing a shared J2EE library into the BEA Workshop 10.0

1. in window->preferences->weblogic->J2EE libraries click 'add'
2. click 'browse', locate your exported WAR or JAR or EAR
3. click 'open', click 'ok'
4. adding it to your project is done by expanding the 'WebLogic Deployment Descriptor' in your project
5. right-click -> add, 'browse', locate it and click 'ok'

redeploying a shared J2EE library from the Workshop

1. in window->preferences->weblogic->J2EE libraries select the library and click 'remove'
2. open the server overview by double-clicking the server in the 'Servers' view
3. in the published module list, select the ear project and click 'Undeploy'
4. after the ear has finished undeploying, click the shared Library module, and click 'Undeploy'
5. now your are ready to deploy the library again, do this by going to window->preferences->weblogic->J2EE libraries
6. click 'add', then 'browse' and navigate to your shared library, select the library, click 'ok'
7. redploy the library by running you project on the server.

torsdag den 11. oktober 2007

Slow development cycles? Try something old-school.

Does this scenario sound familliar?

"Quality Assurance just discovered a logical error in your presentation of the customers product.. instead of the 'Two-month no credit limit' - product, the 'Three-month no credit limit' - product emerges,
Thinking you have the answer, you are remembering the pesky ProductNameProvider you wrote, and you guess it's a one-off error here. You seem to remember that the first two OR three records in its output are just static titels. You can't remember if it was the first two OR three, so you try incrementing your reference-index by 1. You then restart the server, deploy the application and wait patiently for a result...

After replicating the error in 42 easy steps, you conclude this wasn't the right fix, now it's showing a completely wrong 'One-month no credit limit' - product.

You suddenly realize that this could be an error in the actual product request or maybe the text formatting or maybe it's the caching?
- so you try every possible path for the error and try out 3-4 different scenarios for 3-4 different paths only repeating the 42 easy steps... again... and again..."

Such slow processes of bugfixing are caused by ONE thing.. lack of unit tests and proper structure. This cycle of fixing said bug, should have an obvious solution. A unit test somewhere should test that the output of a ProductNameProvider is fixed. Having an automatic testframework and supporting stubbing(maybe mocking), would allow you to test this functionality exclusively and not the entire application at once, which is too complex for any human to handle... well some applications are, especially if you like me only handle applications for shorter periods(READ 3-6 months).

I have just begun working on an assignment at a large danish company. Their applications are mainly J2EE with some heavy junk in the trunk.
My task there involves developing portlets in WebLogic 10. This is a nice platform for portals because of their intuitive CMS and flexibility in layout. BUT since its a J2EE platform, it's inherently slow, because of lots of configuration and stuff under the hood that does lots of... well.. stuff.

To test an application developers tend to restart the server, because we are neurotically inclined to think that an incorrect state of the configuration involved is the main cause of our bugs, and it's understandable. Although 99% of the time, we just fucked up, we do however restart the server about 5 times before admitting it..

NONE of my tasks will ever involve extending the CMS or directly interfacing with the CMS in ANY manner, so why would I need to EVER start this up when doing a simple two-page portlet.. or a complex seventy-page one for that matter.. case is.. I don't and I certainly won't. So, my advice is to avoid that the running of your code requires a running server. A common mistake is to run code, right there in your main methods.. for an example in a struts action, a beehive controller or whatever. Delegate, delegate, delegate all responsibility to classes with limited responsibility.. atleast 1 class, that doesn't require you to load a shitload of code or annotated compiles.

Even a simple ProductNameProvider will eventually be executed atleast 10 times in its development stage and atleast 10 times (maybe hundreds) in its lifetime for debugging purposes.. so lets do the math.

total runs = 20
Writing the ProductNameProvider in a unit test friendly environment = 15 min
Restarting a shitload of applications on a J2EE server + replicating bug = 2 min (atleast)
running a single unit test = 1 sec (atmost)

60 x 15 (unittest creation) + 20 x 1 (unittest running) = 920 seconds
60 x 2 x 20 = 2400 seconds

I am not counting in maintenance of unit tests, this is parrallel to development and should not acquire extra time.. unless you implement a completely new feature..

I'm not even getting in on how unit testing reduces bugs, facilitates refactoring, reduces the footprint of your code etc etc.. there are tons of documented experienced of that out there.. read it.. read it all... now!

Running/testing your code, should be as easy as ALT + SHIFT + x, t

tirsdag den 14. august 2007

Hilbert curve

Is it nerdy when playing with a folding measure, you implement a Hilbert curve?

Hilbert curve

fredag den 27. juli 2007

Synger af vrede

Iliaden er en episk fortælling om grækernes invasion af Troja. Den beskriver det hele, lige fra det politiske spil, til den menige soldats kamp imod overmagten og gudernes indblanden.

Oversættelsen jeg har læst, er skrevet i et moderne sprog, der til tider kan forvirre mig om jeg nu læser et moderne action eventyr, eller om det virkeligt er en fortælling der blev skrevet for over 2500 år siden. I begge tilfælde kan jeg sige at der ikke er nogen mangler på nogen fronter, der kan sætte tvivl ved at dette er den oprindelige fortælling, det oprindelige eventyr.

Akilles er født af guder og dennes kendskab til egen usårlighed og oprindelse giver ham et noget aggressivt væsen, der ikke accepterer et nej. Agamemnon, overkongen, har et udestående med kong Priamos af Troja og inden man er noget ret langt ind i bogen er der så mange årsager til at angribe Troja at det snart bliver uundgåeligt at grækerne samler deres styrker.

Det overrasker mig hvor let bogen springer fra drama til action til erotik og hele vejen tilbage igen ret hurtigt. Det virker fuldstændigt og man får en fornemmelse af at alt kan vende på et øjeblik. Gudernes indblanden i menneskenes affærer retfærdiggøre 100% dette skift i fortællingen og når man kommer til slutningen er man knap i tvivl om at det virkelige drama er sket blandt guderne, på trods af at en af menneskenes byer står i brand(Spoiler! :D).

Odyssus og de andre tapre kriger begår konstant heroiske gerninger, men det er blot opvarmning til det store slag der kommer da Akilles beslutter sig for at drage i krig. Det sker efter at hans bedste ven(og muligvis elsker?) bliver sloget ihjel af troerne. Til sidst dør han dog, som guderne havde bestemt på forhånd og Troja falder. Ingen guder er dog døde, enekelte er knap forurettede, men slaget blev afgjort på Olympen.

Dante's helvede

Den Guddommelige Komedie starter med lyrikken: "Midtvejs på vor rejse igennem livet, finder jeg mig selv i en mørk skov"

Der refereres til at vi alle har en sti vi følger igennem livet, her vågner Dante op og ser sig omkring. Inden længe er hans rejse begyndt og den fører ham igennem helvede, purgatoriet og paradiset. I dette indlæg vil jeg kun kort nævne nogle ting jeg bed mærke i.

Oversættelsen jeg har fundet indeholder oversætterenes noter, hvilket skal vise sig at være helt uvurderligt, da Dante ofte refererer til begivenheder omkring hans hjemstavn i Florence, Italien.

Han tur gennem helvede starter uden for de ni kredse, hvor afdøde der ikke troede på Gud i livet tilbringer evigheden. Her findes alle former for kættere som f.eks. vikinger. Stedet kaldes for limbo.

Dante bruger desuden referencer til Homer's værker, hvilket i en sammenhæng hurtigt kan tilbyde en hel historie til selv den, ellers, mest ubetydelige genstand. Denne egenskab giver fortællingen meget historie, men hvis baglandet mangler, som i mit tilfælde bliver det svært at få det hele med.

Hans tur igennem helvede fortæller om alskens pinsler overgået mest hans landsmænd, men også andre af historiens store personligheder. Yderst finder man de mest larmende, støjende straffe, som de døde der bliver hugget i af djævle, mens de koges i lava. Det er som om at den der straffes mest med sin egen samvittighed betragtes som hårdere ramt. Én undtagelse er der dog Judas, der i en evighed bliver gnavet på af én af djævlens tre hoveder i den inderste kreds.

Jeg glæder mig til at læse om hans rejse igennem de to sidste dødsriger, men først vil jeg have baglandet på plads. Jeg starter med Illiaden.

De episke værker

Jeg har altid haft en vis interesse i oldtiden, som jeg først udforskede i folkeskolen, hvor jeg kunne læse større historiebøger om grækenland og det gamle Persien.

Interessen falmede dog, da de bøget jeg læste manglede indlevelse og overaskelser, det var som om at forfatterne kun beskrev, men aldrig selc lagde en mening eller fortolkning i deres fortælling om hvordan tingene var.

Det kan man ikke bebrejde dem, hvis jeg havde indset at det var ikke den virkelige historie jeg var interesseret i, men mere hvad mennesker på den tid drømte om, hvad de forundrede sig over og hvordan de formulerede deres tanker der ikke var konkrete.

Fornyligt har jeg genoptaget min betagelse af oldtiden og valgte at kaste mig over de episke værker, hvoraf verdens historier og uvirkelighed udspringer.

Jeg refererer til værker som Iliaden, Odysseen, Aeneid og Den Guddommelige Komedie. Dette er listen af bøger jeg har tænkt mig at læse. Listen opstod efter at jeg havde læst første del af Den Gudommelige Komedie. Halvdelen af referencerne forstod jeg knap, jeg opdagede så at de var taget fra ovennævnte bøger, med stor reference til Virgil's Aeneide, da Virgil er Dante's 'guide' igennem de 3 dødsriger.

søndag den 1. juli 2007

Lol@my cat

I just had to attach captions to my cat Maximus. His formal name is Baron von Maximus.. the whiskered beard adds credibility to his noble origin.

So I found some snapshots from when he was kitty. I lol'ed em' up, or rather.. tried to.

Lottery numbers

Hey! Last payout from the danish lottery was a staggering 10 million some euro. I'd like to get a piece of that. So what can we do to improve our chances?

Well, we assume it's impossible to just pick the right numbers, so we'd had to calculate a bit on the chances of a random hit.

In the danish lottery there are 36 different numbers, put together in 10 lists of seven to complete one row and one shot at the big bucks. That equals to 36^7 different possibilities for a row and 36^7 / 10 for a chance on a single ticket. So we'd have to buy 8 million tickets and assemble the orders to make sure there are no duplicates. That would cost 40 million euro's, not counting the hours of work filling in the tickets.

1 way of increasing our chances would be to know all drawn numbers so far and work on a kind of probablity. So what I did was find a way getting all 17 years worth of numbers out of the national lottery's webpage.

I went to the list of draws here: http://www.danskespil.dk/spil/lotto/indhold/resultater.php
But I found it only shows the last 2 years worth. I examined the html and found that an attribute was set to a draw for retrieving your selection. So I put it on the request like this: ?draw=1. I now have access to the last 17 years worth of numbers, so far so good.

I don't want to sit and retrieve all the numbers manually so I had to conjure a script for it. I would've used Python, but I don't know the framework well enough yet, so I ended up using Java. And ofcourse I started of by writing a test:



 @Test
 public void TestSingleExtract() throws Exception{
  LottoDataExtractor lde = new LottoDataExtractor();
  String numbers888 = lde.extract(888);

  assertEquals("01 - 03 - 12 - 15 - 18 - 22 - 31:08", 
    numbers888);

  String numbers1 = lde.extract(1);

  assertEquals("04 - 06 - 10 - 14 - 17 - 22 - 33:02 - 21",
    numbers1);

  String numbers127 = lde.extract(127);

  assertEquals("02 - 04 - 10 - 13 - 15 - 16 - 26:11 - 17 - 35",
    numbers127);
 }

In the above code I test on the extraction of numbers far between because I found out the html shifted a little in characters and the first version of my LottoDataExtractor simple counted characters. So I had to smarten up the program a bit by looking at tags instead.

I now have the lottery numbers from 17 years back. I will have to analyze these a bit, to hopefully find a trend or something, but I don't expect it to give me an immense advantage.

get the neatly formatted numbers here. Note, the last line is draw 889, which is 30/06/07, every line is a new draw.

lørdag den 23. juni 2007

Python - first impression

I've tried out some Python now and it is really intriguing.

One of the first thing you meet is the interactive interpreter. Now, I've come across similar things before, like a javascript shell or even, well, an OS shell.
But in Python this is really how everything works.

Atop of the very interactive programming possibilities, the language is very intuitive.
Like this little snippet, which is how I'd like to write boolean statements, but can't in java for an example.


>>> x = 10
>>> if 0 < x <= 10:
...     print "true"
...
true
>>>

In Python, list is a built-in type. If I want to perform a common action like arranging two lists into a set, I do not have to bother with much complexity. As shown.


>>> listKey = range(5,15)
>>> listValue = range(10)
>>> zip(listKey,listValue)
[(5, 0), (6, 1), (7, 2), (8, 3), (9, 4), (10, 5), (11, 6), (12, 7), (13, 8), (14
, 9)]
>>>

Not quite clear from the code, but the printout is a list of 'tupple' all of which are built-in types.

I will try to implement the 'stamp' program, for seeing how little complexity I will have to deal with.

Coming up; "The Stamp/Python tutorial"

Python

I've heard good things about Python, so I decided to watch a Tech Talk about it.

Basically the ideas are as follows:
1. trust the programmer
2. don't prevent the programmer from doing what needs to be done
3. keep the language small and simple
4. provide only 1 way of doing a specific task

So I asked myself how would I expect to use Python.
Well instead of using my currently favoured languages Java & C#, I would use the language when working with people I totally trust as good programmers with the right skills and same perspective as I. People the speak the same language.. aye.

Some of the differences lie in 'Trust the programmer', if the language has to trust the programmer, then programmers must trust each other. For an example in Python you do not use 'private' annotation or other permissive annotations. This means a programmer of one class cannot totally ensure correct use of the class from another programmer.

If I were to write a program that will be used in a big corporation, where there are programmers of varying skills I would probably like to encapsulate some properties, for them not to be bastardized out of scope and loosing complete track of its usage. The debugging nightmare. To some extend this is argumented against by the speaker in the Tech Talk. He states 'You don't want your code to be misued by fools, but as we all know fools can be ingenious'. What he means is that they will find some other workaround possible obscuring the usage even further.

Several other benefits of Python are passing of functions, usage of templates, multiple inheritance(careful though!) and more.

I will try it out and update with a little example here.

søndag den 17. juni 2007

Sunday java code guidelines

Avoid using static references.

You are making the code dependent on implementation of a function. This does not scale well. It puts your program at risk because a reference somewhere else might require the implementation to change, without the ability to abstract it by polymorphism.

You can use dependency injection to achieve this.. see Guice f.ex.

Use the final keyword as much as possible.

this reduces the chance that you will change the state of an object inadvertently, greatly stabilizing your code as the program evolves and become larger.

Enso

Wow this is a great tool.

A few first impressions:
its sleek and the interface is really nice. The opacity lets me focus on the content while navigating. This has been seen in many a flash webpage. Now I can use it anywhere.

Here's the tasks it's already helping me with:
1. navigating to the correct window
2. opening programs

Because I usually hav a lot of windows open I tend to want to close when I have too many programs open. Which is actually a burden.

When I have to launch some program I don't usually use I have to go through the bookmarks in start-up, which can be somewhat frustrating.

Bookmarks in the browser can also be avoided. Really nice. I'll be updating about this.

Human Computer Interface

I watched this really cool 'talk' video about a way to improve accessing functionality on your computer.
http://video.google.com/videoplay?docid=-6856727143023456694&q=tech+talk&total=878&start=10&num=10&so=0&type=search&plindex=6

Among other things it advocates the use of services, and I now understand the power of services. Applications are restricting use of their functionality by not providing it as a service. I think in this regard that the OSGI framework used for among other things RCP, is a good approximation.

Having functionality segregated into reusable parts is an idea that has existed for a long time. Think toolkits. But these are so closed in and welldefined that they don't leverage creativity.

I'm going to try out the enzo humanized interface, it looks really good!

Second Life

Just entered Second Life and watched a video about it.

I must admit its sounds intriguing.

I dont' want to describe it here, you can to the website and read about it. What is interesting is that soon it will be able to feed xml and html into second life.

That will be amazing.

Everything is Miscellanous

This is really a good philosophical 'Talk', I'll give it a brief thinking through.
http://video.google.com/videoplay?docid=2159021324062223592&q=tech+talk&total=878&start=0&num=10&so=0&type=search&plindex=1

In contructing computer systems I have discovered that categorization itself can be a constriction, that ultimately can lead to severe problems.

For an example in java you have inheritance. As a rule in new generations of development processes everything should be refactorable with the least amount of effort. How can you then defend defining some objects parents, from which all attributes are set in stone. This really emphasizes the problem of the fragile base class. I have seen this sometimes and I have'nt really made any really programmed on any system of a size where changing the baseclass would cost more than a few days of work.

So the definition of anything by it's mere name or conception is fundamnetally flawed in programming. Instead it should be defined by its behaviour. This is generally enforced in the java language, so I can't help wonder why the Class inheritance construct is still is use.

In short, never think you have completely defined anything in you domain. Use interfaces always. Don't extend anything! Always assume that your definition of something will change.

Currently I'm tackling problems with an old system, which really was not designed for the features it now supports. One problem is that in the system a 'Product' is very well defined. So well defined, that changing its structure is basically a rewrite. The product should instead have been defined through interfaces. So a function changing a products price would use it's interface to that extent. A 'pricable' interface perhaps. Define it by behaviour!

Human Computation

In this 'Talk' Louis unveils to the viewer his actually implemented tools for what I might call 'Making work fun'. Something I recently experimented with in using elements of a game in combination with an actual workproces.

In my current employment I have this task of optimizing the flow of a editing images for real estate sales.

How can I make this process feel like a game? - well first you'd have to watch the 'talk' : http://video.google.com/videoplay?docid=-8246463980976635143&q=tech+talk&total=878&start=0&num=10&so=0&type=search&plindex=0

I think I would first try to apply one of the game structures.. maybe the 'synchronous' approach. Have 1 worker edit the actual image and another verify it. This is how the process actual is implemented already. A worker edits the image to fulfill some specifications. Say make the sky appear blue, remove anything unwanted in the picture and so on. Then the QA guy looks at the image and comes with corrections of sorts.

So, the worker gets paid for any image that can be used. He gets rewarded for some effort, with is a common gaming element. Although, how do we make this achievement more interactive and most important FASTER.

Well, one thing we could do is have the QA and the worker, player 1 & 2 if you will, work synchronously. But having 2 employees for a task that can be solved by just 1 is very expensive, therefore we would rather eliminate the QA as a postion and give that task to each of the workers.

By eliminating the QA per se, we can have double the output, but this puts strain on the actual control of the work, because the workers are on the same level and authority is gone. Instead we HAVE to rely on the workers not to 'cheat'. Well this is not a big problem in this case, since in the end, the customers can really easily fulfill the QA task, we just have to have a means of backtracking to the workers involved.

I have to find a concrete implementation of something similar, the ESP game is too simple to use as a model, not that it's no ingenious, it HAS to be simple, because any 1 should be able to do it.

Now we have requirements for the players, that are:
1. has to have skills with photoediting
2. can read/write english and communicate with her peer

In a scenario, the two workers would work on 1 input each parralel, but the tasks would not take an eqaul amount of time to complete, so there's our first problem to be solved. How do we align the proces of each worker and make sure that QA becomes an interactive/dynamic task done by the workers in one effort.

I'll have to research this some more.

Tech Talks

I've just started discovering the rich world of online seminars, when I was introduced to google tech talks.

I must admit, the guys doing these seminars are great inspiration to me.

One thing I started out with was trying to watch seminars that I could use for something. So I started seeing some seminars on test-first development for an example. Quickly though, I ran out of interesting seminars on that topic, but still wanted more.

My couriousity was tingled by the great seminar people who know how to motivate for any lecture. Though a possiblity for gaining knowledge on topics out of my interest, I tried out some new ones.

In this blog I will probably comment on most of them.

søndag den 10. juni 2007

Getters and Setters are evil

In my current employment I am a part of a project that is founding a new system, root up.

What I keep seeing myself doing is adding getters and setters allover my code, but only inside well-defined components. I ask you though, how well-defined must a component be to consider encupsulation for properties?

I have come to the conclusion, that each and every class is a component in it's own right.. so I should drop getters and setter entirely. Ouch, some refactoring to be done.. hmm how to accomplish it, I wonder.

Consider the case of Logging in. The usual way to go about this is to have a user obect compared to some stored data, and then doing some behind the scenes bullshit of connecting to this, and refering to that. What you essentially end up with is a function, like this


boolean login(String username, String password);

and it is placed somewhere on your mother of an application runnable.


Application{
void run(){
splashScreen();
//Hack info from the User and crack away
loginInfo = gatherLoginInfo();
login(loginInfo.name, loginInfo.pass);
}
}

That doesn't seem at all like OO programming though. One would expect this method to be a method of a User. Then the application would ask the user to log themselves in. So instead the implementation would be


User{
void login();
}

Application{
void run(){
splashScreen();
//we have a new User here
User user = new User();
//user, could you please log ourself in?
user.login();
}

Seems more accurately OO-like... right?

Some good reading:
http://www.javaworld.com/javaworld/jw-09-2003/jw-0905-toolbox.html?page=5

Comments can be bad and clutter code

Oh my heart bleeds for I have been deceived by my foulishness.
What I usually tell people when I can't understand their code is:
"Write lots of comments"

Which would solve the problem at hand, but maybe the problem is not my lack of insight, I should be able to comprehend most code, shouldn't I?

So instead lets take an example of some code I might misunderstand, an rewrite it so I could. I don't understand this(code segment randomly picked from http://www.google.com/codesearch):
You can look through it, or just press 'Page Down', to skip to the point.. it's hilaroius, trust me ^^


    public static void main(String[] args) {
        final int NUM_MSGS;
        Connection connection = null;

        if ((args.length < 1) || (args.length > 2)) {
            System.err.println(
                    "Program takes one or two arguments: "
                    + " []");
            System.exit(1);
        }

        String destType = args[0];
        System.out.println("Destination type is " + destType);

        if (!(destType.equals("queue") || destType.equals("topic"))) {
            System.err.println("Argument must be \"queue\" or " + "\"topic\"");
            System.exit(1);
        }

        if (args.length == 2) {
            NUM_MSGS = (new Integer(args[1])).intValue();
        } else {
            NUM_MSGS = 1;
        }

        Destination dest = null;

        try {
            if (destType.equals("queue")) {
                dest = (Destination) queue;
            } else {
                dest = (Destination) topic;
            }
        } catch (Exception e) {
            System.err.println("Error setting destination: " + e.toString());
            e.printStackTrace();
            System.exit(1);
        }

        try {
            connection = connectionFactory.createConnection();

            Session session = connection.createSession(
                        false,
                        Session.AUTO_ACKNOWLEDGE);

            MessageProducer producer = session.createProducer(dest);
            TextMessage message = session.createTextMessage();

            for (int i = 0; i < NUM_MSGS; i++) {
                message.setText("This is message " + (i + 1));
                System.out.println("Sending message: " + message.getText());
                producer.send(message);
            }

            producer.send(session.createMessage());
        } catch (JMSException e) {
            System.err.println("Exception occurred: " + e.toString());
        } finally {
            if (connection != null) {
                try {
                    connection.close();
                } catch (JMSException e) {
                }
            }
        }
    }

- okay I'm dumb I know, I know. In this case, the code was actually substanially commented, I just left out the comments, because then I would have a case. I found that several lines of comment could be left out, just be refining the code a bit. So I put away the code in little neat sections, and then embraced a form of procedural glamour. I came up with this:


    public static void main(String[] args) {
        handleInputArguments(args);
        connect();
    }

Now I understand what it does.. hooray for me! And no time wasted for comments.

Spin-off'ed
Trailing through the Internet with the great firefox add-on StumbleUpon.. I actually stumbled upon something extremely interesting:

http://www.chrylers.com/top-ten-of-programming-advice-to-not-follow

It is nice to hear what seasoned programmers think of advice given to me by various colleagues.

Titles in software development

Why are so many titles popping up everywhere?
Programmer is just not good enough for us anymore.
How about 'Software Developer'?
'Software Architect'
'Software Engineer'

or real bogus titles that sound more like a job description:
'Software Qualtiy Assurance Team Leader'

I'll settle for software developer for now, although my current job entails a great list of different tasks. Lets try it out:
'Quality Assuring Software Architect Designer' - boo!

fredag den 25. maj 2007

COBOL is not useless

I have a kind of friendship with COBOL.
It was the first language I used to program professionally.

Here's an article about COBOL's future.
http://www.ddj.com/dept/architect/199602005

So now a JAVA IDE is bringing COBOL into the new Millenium.. laugh, point.

Thank God I went the JAVA route.

COBOL arrowed!

torsdag den 24. maj 2007

Sailing mishap!

This really got me thinking of how much time some people put into really wierd stuff.

I love homestar++ though.
Wierd link

My Blog is up

I'm trying out blogging.

Now..

Many people blog because of pure exhibitionism.

Not me...

No..

I Blog for the pure sake of getting all those tainted thoughts out of my head.

Now a list of wicked sites I sometimes give a look:

http://xkcd.com/
http://www.dilbert.com/
http://slashdot.org/
http://newz.dk/
http://politikken.dk/

..ah... that feels better.