Perl Weekly Challenge #32, Number of occurences and bar chart

Perl Weekly Challenge #32

This week PWC shall count the number of occurence of terms in a list and in Task #1 print the result to STDOUT. Than in Task #2 a bar graph of the number shall be printed.

For both I created a simple solution in one function. For task #1 I also showed a one-liner.

Then I saw it as one solution with different input, different sorting and also different output methods:

Download File:Solution PWC #32 pwc32.pl

SYNOPSIS

 # perldoc pwc32.pl             - POD
 # ./pwc32.pl html              - HTML/CSS in pwc32.html/pwc32.css
 # ./pwc32.pl help              - Usage information

 ./pwc32.pl <command> [<options>] [<files>]

  command, help|html|task1|task2|simple
     help, prints out some usage information.
     html, writes HTML and CSS from POD to pwc32.html and pwc32.css.
     task1, solution for task #1, reads from only one file.
     task2, solution for task #2, reads from only one file.
     simple, actual solution for task #1 and #2.
  options, control over simple solution.
     --sort=name|value, sorting of name, value or unsorted (default).
     --list, output in list form.
     --csv, output in CSV format.
     --bar, output in bar form.
     --pct, output in bar form with percentage.
  files, list of filenames with items.
     if no files are given, command line allows typing items.
     command line can be used several times to add items.
     command line can be aborted with <ctrl>-c.

  Examples:
     # ./pwc32.pl help
     # ./pwc32.pl html
     # ./pwc32.pl task1 example.txt
     # ./pwc32.pl task2 example.txt
     # ./pwc32.pl simple -b -l -c -p example.txt
     # ./pwc32.pl simple -bar -list -csv -pct example.txt
     # ./pwc32.pl simple --bar --list --csv --pct example.txt
     # ./pwc32.pl simple -p example.txt example.txt
     # ./pwc32.pl simple -p

Definition Task #1: Count number of occurence of items

Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry.

So with the following input in file example.txt

 apple
 banana
 apple
 cherry
 cherry
 apple

the script would display something like:

 apple     3
 cherry    2
 banana    1

For extra credit, add a -csv option to your script, which would generate:

 apple,3
 cherry,2
 banana,1

Definition Task #2: ASCII bar chart

Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it to stdout.

The input could be something like:

 $data = { apple => 3, cherry => 2, banana => 1 };
 generate_bar_graph($data);

And would then generate something like this:

 apple  | ############
 cherry | ########
 banana | ####

If you fancy then please try this as well: (a) the function could let you specify whether the chart should be ordered by (1) the labels, or (2) the values.

Solution Task #1:

Simple solution

In the first simple solution the content of the file is read into an array. Each string is made a hash key with increased value for every occurence. Than the hash key is sorted with the sort() function dependant on the value and each element is printed according to the sorted keys.

Read Filename from Arg, open file and read content to array:

 my $file = shift @ARGV or die "No file given!\n";
 open(my $fh,"$file") or die "Cant open $file!\n";
 my @items = <$fh>;
 close $fh;

Loop over all read items and create hash with item as key and value as number of occurence:

 my %sum;
 foreach my $i (@items) { chomp $i; $sum{$i}++; }

Sort keys of hash according to values of hash with sort() function:

 my @sorted = sort { $sum{$b} <=> $sum{$a} } keys %sum;

Print each key/value of hash according to sorted keys array:

 foreach my $i (@sorted) { print "$i\t$sum{$i}\n"; }

The first simple solution for Task #1 is executed with the following command.

 # ./pwc32.pl task1 example.txt 
 apple  3
 cherry 2
 banana 1

As one-liner

This whole thing, explained above, can be changed into an one-liner for the command line:

 perl -lne '$sum{$_}++; END { foreach( sort { $sum{$b} <=> $sum{$a} } keys %sum ) { print "$i\t$sum{$i}"; } }' example.txt

The options -lne are documented in perldoc perlrun.

Solution Task #2: Simple solution

The solution for Task #2 in one function is similar to the solution for Task #1. Only that the output is printing a bar.

 printf("%10s: %s\n", $i, "#" x $sum{$i});

The "x" Operator is used to print a number of "#" hash signs. For the output the printf() functions is used, because it allows to specify the length of each printed string.

The whole function with reading file, calculating the sum, sorting the values and printing the sorted list is as follows:

 sub task2_simple {
        my ($file) = @_;
        my %sum;
        open(my $fh,$file) or die "Cant open $file!\n";
        my @items = <$fh>;
        close $fh;
        foreach my $i (@items) { chomp $i; $sum{$i}++; }
        my @sorted = sort { $sum{$b} <=> $sum{$a} } keys %sum;
        foreach my $i (@sorted) { printf("%10s: %s\n", $i, "#" x $sum{$i}); }
 }

The Task #2 can be executed with the following command:

 # ./pwc32.pl task2 example.txt 
     apple: ###
    cherry: ##
    banana: #

Solution Task #1/#2: Different Input, Sorting, Output

In our comprehensive solution we put together the different ways for input (STDIN and Files), sorting (unsorted, name or value) and output (list, csv, bar, percentage).

All this needs a more comprehensive main program, that evaluates the options and calls the necessary functions according to the options.

Some extracts from the main program:

Input of Item List

Two different ways of input is required. First input from one or more files and second input from STDIN. I created 3 function for the input.

readstdin()

In the redstdin() function a list is typed on the command line. The list is splitted with the split() function into an array. An array ref from the list is returned.

 sub readstdin {
        print "Type list (Ctrl-c)> ";           # Print prompt
        my $read = <STDIN>;                                     # Read from stdin
        chomp($read);                                           # Eleminate Newline
        my @list = split(" ",$read);            # Create Array
        return \@list;                                          # Return Items
 }

read_files()

The rest of the command line arguments array @ARGV is iterated in a while loop. Each file is read with the read_file() function. Than the items are pushed to the @items array. An array ref of the @items is returned.

 sub read_files {
        my @items;
        while( my $file = shift @ARGV ) {       # Iterate @ARGV
                my $i = read_file($file);               # Read each file
                push(@items,@$i);                               # Push to Items.
        }
        return \@items;                     # Return Items
 }

read_file()

The input argument is one filename. The file is opened, read at once into an array, closed. Than the Items are returned.

 sub read_file {
        my ($file) = @_;                                   # IN: Filename
        open(my $fh,"$file") or die "Cant open $file!\n";  # Open file
        my @items = <$fh>;                                 # Read whole file in array
        close $fh;                                         # Close file
        return \@items;                                    # Return Items
 }

Calculate the Sum

The number of occurences of each item is calculated with an iteration over each item and incrementing a counter for each item.

sum_up($items)

The iteration is done with a foreach loop. Than each hash element of %sum is incremented.

 sub sum_up {
        my ($items) = @_;
        foreach my $i (@$items) { chomp $i; $sum{$i}++; }
 }

Sorting of Item List

The Item list can be sorted with the --sort=name|value option. When the option is not given than an unsorted list is printed.

sorting($sort)

The $sort option can be "name" or "value" or the list is unsorted. The sorting is done with the sort() function.

 @sorted = keys %sum;

Only the keys of the %sum hash are assigned to the @sorted array.

 @sorted = sort keys %sum;

The keys of the %sum hash are alphabetically sorted with the sort function.

 @sorted = sort { $sum{$b} <=> $sum{$a} } keys %sum;

The values of the %sum hash are sorted with the sort function. For details on the sort function see perldoc -f sort.

The sorting() function is doing the sort according to the given option.

 sub sorting {
        my ($sort) = @_; # "name|value" default "unsorted"

        if($sort eq "name") {
                @sorted = sort keys %sum;
        }
        elsif($sort eq "value") {
                @sorted = sort { $sum{$b} <=> $sum{$a} } keys %sum;
        }
        else {
                @sorted = keys %sum;
        }
 }

Output of Item List

For the output four different lists are possible:

output($options)

According to the options --list, --csv, --bar and --pct the lists are printed. All lists can be printed when all options are set. If none of the options is set than the default list is printed.

 sub output {
        my ($o) = @_;   # IN: Option Hash Ref: \%opts

        if($o->{bar}) { print_bar(); }
        if($o->{pct}) { calculate_pct(); print_pct(); }
        if($o->{list}) { $separator = "\t"; print_list(); }
        if($o->{csv})  { $separator = ",";  print_list(); }
        if(!$o->{list} and !$o->{bar} and !$o->{csv} and !$o->{pct}) { 
                $separator = "\t"; print_list(); 
        }
 }

Prints an ascii bar according to a sorted list. The x Operator is used to print a "#" hash char for each value of each items sum.

 sub print_bar {
        foreach my $i (@sorted) { printf("%10s: %s\n", $i, "#" x $sum{$i}); }
 }

Prints a list of items according to a sorted list.

 sub print_list {
        foreach my $i (@sorted) { print "$i$separator$sum{$i}\n"; }
 }

Prints a bar with percentage values according to a sorted list. @sorted contains a sorted list of items, %sum contains the sum of each item, %pct contains each percentage value, %blk contains each length of the bar.

 sub print_pct {
        foreach my $i (@sorted) { 
                printf("%-10s (%2d): %-${max}s| %3.2f %%\n", $i, 
                        $sum{$i}, ('#' x $blk{$i}), $pct{$i}); 
        }
 }

calculate_pct()

Calculates a percentage value of each item. The sum of all items is 100%, this value is stored in the hash %pct. Also the width of the terminal is determined with module Term::Size::Perl and function chars() that returns the width of the terminal. The width is used to adapt the blocks to the available width, this is stored in Hash %blk.

 sub calculate_pct {
        my $s;                        # Sum of all items
        foreach my $i (keys %sum) { $s += $sum{$i}; } 
        my $width = chars();          # Width of terminal
        $max = int($width/2);         # Max is half of terminal width
        foreach my $i (keys %sum) {   # Calculate each percentage value
                $pct{$i} = $sum{$i} * 100 / $s; 
                $blk{$i} = int( $sum{$i} * $max / $s );
        }
 }

AUTHOR

Chuck

 Perl Weekly Challenge #32, Number of occurences and bar chart