Category: development

How to Indent XML String in Java (Pretty)

Hello Guys,

This is a cool way to prettify your XML (String format) in Java Language:

public static String format(String xml, Boolean ommitXmlDeclaration) throws IOException, SAXException, ParserConfigurationException {

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));

OutputFormat format = new OutputFormat(doc);
format.setIndenting(true);
format.setIndent(2);
format.setOmitXMLDeclaration(ommitXmlDeclaration);
format.setLineWidth(Integer.MAX_VALUE);
Writer outxml = new StringWriter();
XMLSerializer serializer = new XMLSerializer(outxml, format);
serializer.serialize(doc);

return outxml.toString();

}

Backup File Utility in Python [Tkinter application]

Hi readers,

This is a graphic software (tkinter) in Python to back up a file on your computer. A backup folder is created in the current directory of the selected file and then every x minutes a backup of the file is done, respecting the maximum number of files in the backup directory.

backup_app

 

backup_app_dir

 

import hashlib, time, shutil, os, threading
from datetime import datetime
from Tkinter import Label, Spinbox, Listbox, Scrollbar, Tk, E, W, S, N, Button, Frame
import tkFileDialog
from os.path import expanduser
 
class App(Frame):
 
    def __init__(self, parent):
 
        Frame.__init__(self, parent)
        self.parent = parent
 
        # Class Variables
        self.directory_selected = ''
        self.filename_selected = ''
        self.backup_folder_name = 'backup'
        self.num_max_files = None
        self.time_for_backup = None
        self.full_file_path = ''
        self.backup_running = False
        self.hash = None
 
        self.initUI()
 
    def initUI(self):
        self.parent.title("Application to Backup a File")
 
        self.label_minutes = Label( text='Time for Backup (Minutes): ', height=2)
        self.spinbox_minutes = Spinbox( from_=1, to=100, width=5)
        self.spinbox_minutes.delete(0, 'end')
        self.spinbox_minutes.insert(0,5)
 
        self.label_numfiles = Label( text='Maximum number files in backup directory: ')
        self.spinbox_numfiles = Spinbox( from_=1, to=100, width=5)
        self.spinbox_numfiles.delete(0, 'end')
        self.spinbox_numfiles.insert(0,10)
 
        self.scrollbar_loglist = Scrollbar()
        self.listbox_loglist = Listbox(yscrollcommand=self.scrollbar_loglist.set, width=70)
        self.scrollbar_loglist.config(command=self.listbox_loglist.yview)
 
        self.button_file_choose = Button(text='Choose a file for backup ...', command=self.click_button_start_backup)
        self.button_stop_backup = Button(text='Stop Backup', command=self.click_button_stop_backup)
        self.button_stop_backup['state'] = 'disabled'
        self.button_exit = Button(text='Exit', command=self.click_button_exit)
 
        # Here the UI components are designed using grid layout
        self.label_minutes.grid(row=0, sticky=E)
        self.spinbox_minutes.grid(row=0, column=1, sticky=W)
        self.label_numfiles.grid(row=1, sticky=E)
        self.spinbox_numfiles.grid(row=1, column=1, sticky=W)
        self.listbox_loglist.grid(row=2, column=0, sticky=W+E+N+S, columnspan=3)
        self.scrollbar_loglist.grid(row=2, column=3, sticky=N+S)
        self.button_file_choose.grid(row=3, column=0)
        self.button_stop_backup.grid(row=3, column=1)
        self.button_exit.grid(row=3, column=2)
 
        # define options for opening or saving a file
        self.file_opt = options = {}
        options['initialdir'] = expanduser("~")
        options['parent'] = self
        options['title'] = 'Select a file for backup ...'
 
    def get_ask_open_file(self):
        return tkFileDialog.askopenfilename(**self.file_opt)
 
    def get_current_datetime(self):
        return datetime.strftime(datetime.now(), "%d%m%Y_%H%M%S")
 
    def get_current_datetime_formatted(self):
        return datetime.strftime(datetime.now(), "[%d/%m/%Y %H:%M:%S] - ")
 
    def log_action(self, msg):
        self.listbox_loglist.insert(0, self.get_current_datetime_formatted() + msg)
 
    def do_file_backup(self):
        curr_datetime = self.get_current_datetime()
        nom_arquivo = self.filename_selected.split('.')[0] + '_' + curr_datetime + '.' + self.filename_selected.split('.')[1]
        shutil.copyfile(self.full_file_path, self.directory_selected + os.sep + self.backup_folder_name + os.sep + nom_arquivo)
        self.log_action('Backup Done - ' + nom_arquivo)
 
    def get_file_hash_md5(self, file):
       md5 = hashlib.md5()
       with open(file, "rb") as f:
           for block in iter(lambda: f.read(128), ""):
               md5.update(block)
       return md5.hexdigest()
 
    def listdir_fullpath(self, d):
        return [os.path.join(d, f) for f in os.listdir(d)]
 
    def delete_oldest_files(self):
        num_total_files_in_directory = len(self.listdir_fullpath(self.directory_selected + os.sep + self.backup_folder_name))
        num_files_to_be_removed = num_total_files_in_directory - self.num_max_files
        if num_files_to_be_removed > 0:
            self.log_action('Removing ' + str(num_files_to_be_removed) + ' old files')
        for i in range(num_files_to_be_removed):
            file_to_remove = self.get_oldest_file_from_directory(self.directory_selected + os.sep + self.backup_folder_name)
            os.remove(file_to_remove)
            self.log_action('An old file was removed - ' + file_to_remove.split(os.sep)[-1])
 
    def get_oldest_file_from_directory(self, dir):
        return min(self.listdir_fullpath(dir), key=os.path.getctime)
 
    def start_loop_backup_job(self):
        if not self.backup_running:
            return
        self.delete_oldest_files()
        if self.hash != self.get_file_hash_md5(self.full_file_path):
            self.do_file_backup()
        else:
            self.log_action('The file was not changed since last check')
        self.hash = self.get_file_hash_md5(self.full_file_path)
        self.parent.after(self.time_for_backup * 1000 * 60, self.start_loop_backup_job)
 
    def click_button_start_backup(self):
        file_choosed = self.get_ask_open_file()
        if not os.path.isfile(file_choosed):
            return
        self.log_action('Backup has been started')
        self.backup_running = True
        self.button_file_choose['state'] = 'disabled'
        self.button_stop_backup['state'] = 'normal'
        self.directory_selected = os.path.split(os.path.abspath(file_choosed))[0]
        self.filename_selected = os.path.split(os.path.abspath(file_choosed))[1]
        self.full_file_path = self.directory_selected + os.sep + self.filename_selected

        if not os.path.exists(self.directory_selected + os.sep + self.backup_folder_name):
            os.mkdir(self.directory_selected + os.sep + self.backup_folder_name)

        self.time_for_backup = int(self.spinbox_minutes.get())
        self.num_max_files = int(self.spinbox_numfiles.get())
 
        self.backup_running = True
        self.start_loop_backup_job()
 
    def click_button_stop_backup(self):
        self.button_file_choose['state'] = 'normal'
        self.button_stop_backup['state'] = 'disabled'
        self.log_action('Backup has been stopped')
        self.backup_running = False
 
    def click_button_exit(self):
        self.parent.destroy()
 
 
def main():
    root = Tk()
    root.eval('tk::PlaceWindow %s center' % root.winfo_pathname(root.winfo_id()))
    app = App(root)
    root.mainloop()
 
if __name__ == '__main__':
    main()

Spring Batch Partitioner – Case Study with SourceCode – Best Practices

I’m writing this post because i report a bug at Spring Community Jira, this is the link:

https://jira.spring.io/browse/BATCH-2309

I started a sample project which could reproduce the problem to show the community what I was experiencing, but to my surprise I was using the partitioner feature incorrectly. I am writing this post to share what I learned throughout this experience to help those who are going through the same questions.

My Goal: I wanted to use the resource partitioner for parallel processing but was worried to use the primary key of the table (column ID) because my table has gaps (id column is not incremental) and for this reason the partitioner would distribute number of different records for each thread, thus being inefficient in their distribution.

For example:

This is the good example partitioner:

https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/common/ColumnRangePartitioner.java

Suppose that my table has the following records: Ids 1, 8, 9,10 11, 12, 13, 14, 15.

min: 1

max: 15

gridSize = number of threads = 2 in this example

target size calculation: int targetSize = (max min) / gridSize + 1;

(15 – 1) / 2 + 1 = 8

In this example:

Thread number 1 will receive to process: 1 to 8

Thread number 2 will receive to process: 9 to 16

The Problem: Thread 1 receives only two records to process (The Id’s 1 and 8) and the thread 2 will receive 7 records to process. At this case the partitioner to split incorrectly number of records between threads.

My Goal: I want to split the number of records equally between all threads.

Where I was going wrong: To achieve my goal I tried to use a query that makes use of rownum and / or ntile oracle feature, the goal was to use the split an id that is sequential, with no gaps in the id column table, so the load would be uniform among the threads. The JdbcPagingItemReader class can not be used with multithreaded characteristics using Oracle ROWNUM because the query is partially executed multiple times in the database and there is no guarantee that all records are processed because a confusion of Ids between threads occurs.

 The correct way: You can use JdbcPagingItemReader using the Primary Key column (may be single or multiple columns) or JdbcCursorItemReader can use both the PK column or  Rownum / NTILE to do division.

Why use JdbcCursorItemReader not cause problems of mistaken IDs or lost records ?

This class executes the query once the database and will use chunk mode to fetch the records as needed. If you use a rownum column in this case will not cause data loss because every query is processed only once in the database.

To illustrate and facilitate understanding, I created a design example set with various possible configurations available here:

springbatchpoc

GitHub Example Project:

https://github.com/victorjabur/PartitionSpringBatch_DataLose_Poc_BATCH-2309

Here are the sql scripts to create the database tables used in this poc:

https://github.com/victorjabur/PartitionSpringBatch_DataLose_Poc_BATCH-2309/tree/master/src/main/resources/sql

  1. JdbcCursorItemReader-OracleNtile – It works
  2. JdbcCursorItemReader-OracleRownum – It works
  3. JdbcPagingItemReader-OracleNtile – It not works, don’t use this. PagingReader does not work with NTile
  4. JdbcPagingItemReader-OracleRownum – It not works, don’t use this. PagingReader does not work with Rownum
  5. JdbcPagingItemReader-TablePrimaryKey – It works, but the records aren’t distributed in an uniform way (same quantity for each thread)

What is Oracle NTile ?
This feature of Oracle Database can create a desired number of containers so that each thread can consume one. For example: I have 1000 records in the database to be divided among 10 threads:

SELECT ID, DESCRIPTION, FLAG_PROCESSED, NTILE(10) OVER (ORDER BY ID)
AS CONTAINER_COLUMN FROM TABLE_SOURCE
WHERE FLAG_PROCESSED = 'N';

With this query, you can use the column “CONTAINER_COLUMN”, values are already pre split into buckets ready to be divided among the various threads.

This is the documentation with more clarified explanation:

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions101.htm

That’s it.

Any question or suggestion is very welcome.

Credits to this post:

http://alexandreesl.wordpress.com/2014/09/21/spring-batch-construindo-processamento-massivo-de-dados-em-java/
http://www.mkyong.com/spring-batch/spring-batch-partitioning-example/
https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples

Cheers,
Victor Jabur