python - Decompressing a text file -
so have compressed text need decompress able recreate text.
the compression :
import zlib, base64 text = raw_input("enter sentence: ")#asks user input text text = text.split()#splits sentence uniquewords = [] #creates empty array word in text: #loop following if word not in uniquewords: #if word not in uniquewords uniquewords.append(word) #it adds word empty array positions = [uniquewords.index(word) word in text] #finds positions of each uniqueword positions2 = [x+1 x in positions] #adds 1 each position print ("the uniquewords , positions of words are: ") #prints uniquewords , positions print uniquewords print positions2 file = open('task3file.txt', 'w') file.write('\n'.join(uniquewords))#adds uniquewords file file.write('\n') file.write('\n'.join([str(p) p in positions2])) file.close() file = open('compressedtext.txt', 'w') text = ', '.join(text) compression = base64.b64encode(zlib.compress(text,9)) file.write('\n'.join(compression)) print compression file.close()
my attempt @ decompression is:
import zlib, base64 text = ('compressedtext.txt') file = open('compressedtext.txt', 'r') print ("in file is: \n") + file.read() text = ''.join(text) data = zlib.decompress(base64.b64decode(text)) recreated = " ".join([uniquewords[word] word in positions]) #recreates sentence file.close() #closes file print ("the sentences recreated: \n") + recreated
but when run decompression , try recreate original text error message appears saying
file "c:\python27\lib\base64.py", line 77, in b64decode raise typeerror(msg) typeerror: incorrect padding
does know how fix error?
there few things going on here. let me start giving working sample:
import zlib, base64 rawtext = raw_input("enter sentence: ") # asks user input text text = rawtext.split() # splits sentence uniquewords = [] # creates empty array word in text: # loop following if word not in uniquewords: # if word not in uniquewords uniquewords.append(word) # adds word empty array positions = [uniquewords.index(word) word in text] # finds positions of each uniqueword positions2 = [x+1 x in positions] # adds 1 each position print ("the uniquewords , positions of words are: ") # prints uniquewords , positions print uniquewords print positions2 infile = open('task3file.txt', 'w') infile.write('\n'.join(uniquewords)) # adds uniquewords file infile.write('\n') infile.write('\n'.join([str(p) p in positions2])) infile.close() infile = open('compressedtext.b2', 'w') compression = base64.b64encode(zlib.compress(rawtext, 9)) infile.write(compression) print compression infile.close() # read again infile = open('compressedtext.b2', 'r') text = infile.read() print("in file is: " + text) recreated = zlib.decompress(base64.b64decode(text)) infile.close() print("the sentences recreated:\n" + recreated)
i've tried keep things pretty close had, note in particular few changes:
i'm trying more track raw text versus processed text.
i've removed redefinition of zlib.
i've removed line breaks break decompression.
i've done general clean-up better conform normal python conventions.
hope helps.
Comments
Post a Comment