Zeus Macro for deleting duplicate lines?

This forum allows you to share scripts with other Zeus users. Please do not post bug reports, feature requests or questions to this forum, but rather use it exclusively for posting scripts or for the discussion of scripts that have been posted.
Post Reply
ptravers
Posts: 4
Joined: Tue Apr 29, 2014 2:34 pm

Zeus Macro for deleting duplicate lines?

Post by ptravers »

I've got a file of a few thousand email addresses and I'd like to delete all the exact duplicates but I can't figure out how to do it... anyone?
jussij
Site Admin
Posts: 2650
Joined: Fri Aug 13, 2004 5:10 pm

Re: Zeus Macro for deleting duplicate lines?

Post by jussij »

This remove_lines.lua file should do the trick:

Code: Select all

function key_macro()
    screen_update_disable()

    -- save current search settings
    search_options_save()

    -- save the current cursor details
    cursor_save()

    -- the different available scope options
    SCOPE_FORWARD = 0   -- forward from current cursor
    SCOPE_REVERSE = 1   -- reverse from current cursor
    SCOPE_MARKED  = 2   -- marked region only
    SCOPE_ENTIRE  = 3   -- entire contents of current document
    SCOPE_ALL     = 4   -- all currently open documents

    -- set the required search scope option
    set_search_option("Scope", SCOPE_ENTIRE)

    -- set the other search options 
    set_search_option("UseCase"   , 0)
    set_search_option("WholeWord" , 0)
    set_search_option("RegExpress", 1)

    -- do the search and replace for the next match only
    replace_all = 0

    -- do the search and replace for all matches found
    replace_all = 1

    -- search for any empty lines and replace them with a single empty line
    replace("^(\\n$)(\\1$)+" , "\\n", replace_all)

    -- restore the cursor details
    cursor_restore()

    search_options_restore()
    screen_update_enable()
    screen_update()
end

key_macro() -- run the macro
The macro uses a regexp to search form multiple blank lines and replace them with a single blank line:

Code: Select all

replace("^(\\n$)(\\1$)+" , "\\n", replace_all)
Also if you use this scope you can have the macro run for all open files:

Code: Select all

SCOPE_ALL     = 4   -- all currently open documents
IMPORTANT: Also as always, before running this script, make sure you have backup copies of the files (or better still have the files under source control), so you can restore these files should something go wrong :!:

Cheers Jussi
ptravers
Posts: 4
Joined: Tue Apr 29, 2014 2:34 pm

Re: Zeus Macro for deleting duplicate lines?

Post by ptravers »

If I understand your script correctly, this macro simply removes multiple blank lines from the file... not the duplicated text *on* the lines...?

Example:
fred@foo.com
mary@foo.com
mary@foo.com
mary@foo.com
tom@foo.com
tom2@foo.com
victor@foo.com
victor@foo.com
william@foo.com
william@msn.com

The script I'm looking for would remove two of the mary@foo.com and one victor@foo.com lines in the example above. The rest of the lines are unique.
jussij
Site Admin
Posts: 2650
Joined: Fri Aug 13, 2004 5:10 pm

Re: Zeus Macro for deleting duplicate lines?

Post by jussij »

Sorry. My mistake for not taking the time to correctly read your original post :oops:

Yes, you are correct that macro will remove multiple empty lines replacing them with a single line.

To remove adjacent, duplicate lines just change the regexp to be this:

Code: Select all

    -- search for any duplicate lines with with a single line of text
    replace("^(.*)(\\r?\\n\\1)+$" , "\\1", replace_all)
Cheers Jussi
ptravers
Posts: 4
Joined: Tue Apr 29, 2014 2:34 pm

Re: Zeus Macro for deleting duplicate lines?

Post by ptravers »

Thanks! I didn't realize that you could refer to a regexp pattern-match string within the <find> string.

Given your updated example, I ended up doing a simple REPLACE, finding "^(.*)\n\1$", replacing with "\1". :) Of course, I had to perform the search & replace several times to get rid of the multiple occurrences of a single string, but WOW - thanks!
jussij
Site Admin
Posts: 2650
Joined: Fri Aug 13, 2004 5:10 pm

Re: Zeus Macro for deleting duplicate lines?

Post by jussij »

Yep, running the regexp directly from the Search/Replace dialog is naturally also an option, as the macro replace function calls the same underlying replace functionality ;)

Another option is the Editor, Sort menu, which also has an option to remove duplicate lines. I use this feature quite a bit :)

But one warning on when using this Sort option.

As the name suggests this dialog will sort the contents of the file (or the marked area) and it also has the option to remove any duplicates.

All of those edits as 100% undo-able and this means if you try to do this for a really big file you might run into issues.

Firstly the bigger the file the longer it will take to complete and finally because the changes are undo-able it consumes lots of memory. So if the file is big enough you might even crash the editor if it runs out of memory :(

Cheers Jussi
kurasov1965K

Zeus Macro for deleting duplicate lines

Post by kurasov1965K »

I am a bit rusty on my Pascal. I have a need to randomize the lines in a text file. Does anyone have a sample script that might help me get started? I need to do this several times a week so it seems worth the effort to create a script. The files contain between 10 and 20 thousand lines.
jussij
Site Admin
Posts: 2650
Joined: Fri Aug 13, 2004 5:10 pm

Re: Zeus Macro for deleting duplicate lines?

Post by jussij »

I have a need to randomize the lines in a text file
I would use Python to do this and since Zeus comes with a version of Python, you will not need to install anything additional to have this work.

As an example, the shuffle.py code below reads in an input file, shuffle it's contents and writes out the shuffled results to an ouput file:

Code: Select all

from random import shuffle

def ShuffleFile(inputFileName, outputFileName):
    # read in the input file
    with open(inputFileName, 'r') as fileInput:
        lines = [i for i in fileInput.readlines()]

    # shuffle the fines
    shuffle(lines)

    # write the output file (change to overwrite same file)
    with open(outputFileName, 'w+') as fileOutput:
        for item in lines:
            fileOutput.write(item)

in_file="d:/temp/input.txt"
out_file="d:/temp/output.txt"

# call the shuffle function
ShuffleFile(in_file, out_file)
To run this in Zeus just define a tool that runs this command:

Code: Select all

python.exe shuffle.py
You can also test the file from inside Zeus as follows:

1. Make sure you have an input file called: d:/temp/input.txt

2. Open the shuffle.py file in Zeus

3. Use the Macros, Execute 'Shuffe.py' Script menu

4. To see the results open the file called: d:/temp/output.txt

Cheers Jussi
jussij
Site Admin
Posts: 2650
Joined: Fri Aug 13, 2004 5:10 pm

Re: Zeus Macro for deleting duplicate lines?

Post by jussij »

This random.py script can be used to randomize a marked region of text from inside editor. To randomize a region of text inside a file, first load the file into the editor, mark the lines of text to be randomized and then run the macro.

Code: Select all

#
#        Name: Random Marked Area
#
#      Author: Jussi Jumppanen
#
#    Language: Python
#
# Description: Using a marked region of text, randomize the marked lines
#              of text and replace them with a random result set.
#
import zeus
import random

def key_macro():
    if zeus.is_document():
        if zeus.is_read_only() == False:
            if (zeus.is_marked() == 1):
                lines = []

                zeus.cursor_save()

                # get the marked area details
                mode   = zeus.get_marked_mode()
                top    = zeus.get_marked_top()
                bottom = zeus.get_marked_bottom()
                left   = zeus.get_marked_left()
                right  = zeus.get_marked_right()
                delta  = bottom - top

                # copy and shuffle the lines
                for index in range(top, bottom + 1):
                    text = zeus.get_line_text(index)
                    lines.append(text)

                random.shuffle(lines)

                zeus.MarkDelete()

                # insert the new lines and restore the markings
                for line in lines:
                    zeus.line_insert(top, line)

                zeus.set_marked_area(mode, top, left, bottom, right)

                zeus.cursor_restore()
            else:
                zeus.message('This requires a marked region of text.')
                zeus.beep()
        else:
            zeus.message('This macro only works for a writable document.')
            zeus.beep()
    else:
        zeus.message('This macro only works for documents.')
        zeus.beep()

key_macro()  # run the macro
Cheers Jussi
Post Reply