How to extract many .doc text + tabular elements into CSV by any Unix tool?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite
1












This thread is considering the part (1) of the thread How to split Excel table into CSV files in .doc by Bold text?
You have 777 .doc files where each .doc file contains a big Excel table.
The following working work process works correctly, which allows me to work with the data, since you can convert Spreadsheet data into CSV files and databases.
However, I want to automate this step to evaluate better the export process from .doc file into data file, and since there are too many files to do it for all those files.
I cannot bulk-study the contents of all those files so I am thinking a scripting approach to iterate through all .doc files, but not sure if any interface and/or scripting tool exists for such a task.



  1. Doing CTRL+A

  2. and copy-pasting the content into any spreadsheet editor (I used WPS editor)

Source files: .doc files containing some text and Excel tables



Target file: Excel file, and/or anything similar, etc WPS editor file, LibreOffice file, ...



  • one Excel file can be sufficient for all .doc fies because each .doc file has a top line that can be used as a heading and separater in later categorising the content

OS: Linux Debian Stretch 9 and others

Data: example .odt file here







share|improve this question




















  • @DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
    – Léo Léopold Hertz 준영
    Oct 27 '17 at 18:10















up vote
0
down vote

favorite
1












This thread is considering the part (1) of the thread How to split Excel table into CSV files in .doc by Bold text?
You have 777 .doc files where each .doc file contains a big Excel table.
The following working work process works correctly, which allows me to work with the data, since you can convert Spreadsheet data into CSV files and databases.
However, I want to automate this step to evaluate better the export process from .doc file into data file, and since there are too many files to do it for all those files.
I cannot bulk-study the contents of all those files so I am thinking a scripting approach to iterate through all .doc files, but not sure if any interface and/or scripting tool exists for such a task.



  1. Doing CTRL+A

  2. and copy-pasting the content into any spreadsheet editor (I used WPS editor)

Source files: .doc files containing some text and Excel tables



Target file: Excel file, and/or anything similar, etc WPS editor file, LibreOffice file, ...



  • one Excel file can be sufficient for all .doc fies because each .doc file has a top line that can be used as a heading and separater in later categorising the content

OS: Linux Debian Stretch 9 and others

Data: example .odt file here







share|improve this question




















  • @DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
    – Léo Léopold Hertz 준영
    Oct 27 '17 at 18:10













up vote
0
down vote

favorite
1









up vote
0
down vote

favorite
1






1





This thread is considering the part (1) of the thread How to split Excel table into CSV files in .doc by Bold text?
You have 777 .doc files where each .doc file contains a big Excel table.
The following working work process works correctly, which allows me to work with the data, since you can convert Spreadsheet data into CSV files and databases.
However, I want to automate this step to evaluate better the export process from .doc file into data file, and since there are too many files to do it for all those files.
I cannot bulk-study the contents of all those files so I am thinking a scripting approach to iterate through all .doc files, but not sure if any interface and/or scripting tool exists for such a task.



  1. Doing CTRL+A

  2. and copy-pasting the content into any spreadsheet editor (I used WPS editor)

Source files: .doc files containing some text and Excel tables



Target file: Excel file, and/or anything similar, etc WPS editor file, LibreOffice file, ...



  • one Excel file can be sufficient for all .doc fies because each .doc file has a top line that can be used as a heading and separater in later categorising the content

OS: Linux Debian Stretch 9 and others

Data: example .odt file here







share|improve this question












This thread is considering the part (1) of the thread How to split Excel table into CSV files in .doc by Bold text?
You have 777 .doc files where each .doc file contains a big Excel table.
The following working work process works correctly, which allows me to work with the data, since you can convert Spreadsheet data into CSV files and databases.
However, I want to automate this step to evaluate better the export process from .doc file into data file, and since there are too many files to do it for all those files.
I cannot bulk-study the contents of all those files so I am thinking a scripting approach to iterate through all .doc files, but not sure if any interface and/or scripting tool exists for such a task.



  1. Doing CTRL+A

  2. and copy-pasting the content into any spreadsheet editor (I used WPS editor)

Source files: .doc files containing some text and Excel tables



Target file: Excel file, and/or anything similar, etc WPS editor file, LibreOffice file, ...



  • one Excel file can be sufficient for all .doc fies because each .doc file has a top line that can be used as a heading and separater in later categorising the content

OS: Linux Debian Stretch 9 and others

Data: example .odt file here









share|improve this question











share|improve this question




share|improve this question










asked Oct 27 '17 at 17:40









Léo Léopold Hertz 준영

9081041102




9081041102











  • @DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
    – Léo Léopold Hertz 준영
    Oct 27 '17 at 18:10

















  • @DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
    – Léo Léopold Hertz 준영
    Oct 27 '17 at 18:10
















@DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
– Léo Léopold Hertz 준영
Oct 27 '17 at 18:10





@DopeGhoti Generally, I cannot do the iteration part: doing it for all at once OR looping one by one.
– Léo Léopold Hertz 준영
Oct 27 '17 at 18:10
















active

oldest

votes











Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f400932%2fhow-to-extract-many-doc-text-tabular-elements-into-csv-by-any-unix-tool%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f400932%2fhow-to-extract-many-doc-text-tabular-elements-into-csv-by-any-unix-tool%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay