English Dictionary in csv format

I was working on a project on an English Dictionary for Scilab where I made use of a dictionary in a csv file.

english dictionary excel file text file csv

I got the word meanings from OPTED(The Online Plain Text English Dictionary), which is based on “The Project Gutenberg Etext of Webster’s Unabridged Dictionary” which is in turn based on the 1913 US Webster’s Unabridged Dictionary (See Project Gutenburg), as a text file.

I then converted the text files into csv. I am sharing those here if anyone needs them.

Just don’t forget to thank me in the comments section below.

[EDIT: Notmi Namae made a generous contribution by compiling all the words in a singe CSV with the title, type and meaning in different cells. You can download it: dictionary.csv]

Dictionary in csv:

Z Y X W V U T S R Q P O N M L K J I H G F E D C B A

Words in Csv:

Aword Bword Cword Dword Eword Fword Gword Hword Iword Jword Kword Lword Mword Nword Oword Pword Qword Rword Sword Tword Uword Vword Wword Xword Yword Zword

Download all of them in a zip file:

Word lists in csv.zip

Dictionary in csv.zip

Hope you find it useful!

Acknowledgement:

OPTED(The Online Plain Text English Dictionary)

 

 

PhD researcher at Friedrich-Schiller University Jena, Germany. I'm a physicist specializing in theoretical, computational and experimental condensed matter physics. I like to develop Physics related apps and softwares from time to time. Can code in most of the popular languages. Like to share my knowledge in Physics and applications using this Blog and a YouTube channel.
[wpedon id="7041" align="center"]

75 thoughts on “English Dictionary in csv format

      1. Hi, I’m noticing that the D words are missing. Do you have a corrected list with all the D words in the same format as the rest of the dictionary?

  1. its not very useful if the word is in the same cell as the definition. And i think most systems could handle all 26 letters in one file

    1. Well you can easily separate the words from the meanings by little programming.. I mean you could use a code that would find the first occurrence of this ‘)’ and then copy everything after it to a different file.
      I preferred to have the words and the meanings together cause I wanted to display the whole thing.
      Again a little programming can be used to concatenate(append) all the files together. However I didnt do that to save processing time. Cause I could just have a look at the first letter and know the file that I need to look for the word in thereby saving me some time. I f you have all the words in the same file and the user enters a word startin
      g from ‘z’ then you would have to unnecessarily go through almost all the words in the file, thereby wasting a lot of time.

      1. The dictionary file is completely corrupt, is is missing large patches of definitions in several letters. The csv individual letter files are complete, but they are not parsed properly into cells. I am correcting it all and putting it into one file, but I will need to get rid of all the spaces in the parts of speech ‘field’ to use a space delimiter, and then I will need to concatenate all of the cells from the definitions back together in individual files, and hopefully dump the double spacing, will probably need to do it with 26 separate files and finally combine, these systems don’t want to concatenate 30+ columns for hundreds of thousands of items at once, it strains this 16gb ryzen 7 significantly. What I don’t understand is how serious computer scientists could miss these huge gaps. D is missing entirely in the dictionary file. It will take half a day to fix this. Thanks for including the individual letter files, else nothing would save this unless I wanted to parse the full text files from Gutenberg project, and those are probably all gone now. Even then, a real pain.

        1. James, thanks for looking so closely at it. I agree – there’s no excuse for them to post invalid information. If they’re gonna do it, do it right. And I notice the master dictionary.csv file is still bad. They haven’t fixed it.

          But you shouldn’t have to spend half-a-day cleaning up one file. A few lines of python will suffice to examine each letter source file, parse the lines into 3 parts (word, category, definition), and write all the data out to one big .csv.

          I’m about to do it, to make sure I get a clean list. Contact me if you want to discuss. Thanks

  2. Hi dear Sharma
    I am Manoochehr Karimi , English Language Teacher, from Iran.
    I need the audio files of each English words in mp3.
    Would you mind helping me how to get them?
    Thanks a million
    email: [email protected]

  3. You are awesome. Thanks for sharing. Can’t believe some are complaining, lol. Saved me a bunch of time! 🙂

  4. How is this more practical than the original OPTED dictionary?
    You just removed the structure and added random linebreaks and quotation marks so that it cannot be reused as easily. CSV is there to have multiple cells in different lines and not to take the whole line and put it into a single cell, so why didn’t you just utilize this?

    1. For one thing, the software that I was working with could only read a csv. So basically yeah its just a txt converted to a csv.
      I take it that you want the word to be in a different column and the meaning in another.
      Well, I didn’t need that as I already explained in a previous comment.

      1. Okay, I understand.
        I downloaded the dictionary from the original page myself now and formatted it into a single file (including a-z) with UTF-8 encoding and separate columns of the form:
        “word”,”type”,”description”

        If anyone needs it, I put it onto my google drive:
        https://drive.google.com/open?id=0ByLZAhdz2XrhdTMtZ0hRaExZUkE

        (PS: If you want to, you can also add it to your blog above as download)

        1. Just found this. Strong work. The file Notmi created, at least the version posted here, only has b words through badger. I didn’t look further — there may be other missing words.

        2. Many words are lacking in your dictionary. I don’t know if you have taken note of that.
          However, I appreciate the work. I think it will be better if you can complete the dictionary by including the missing words.

  5. This is a poet’s dream. I couldn’t find anything like this until now. It’s been years. The find function is a journey.

  6. THANK YOU SO MUCH! I was looking for this! I’m making a java project based on Scrabble board game. I’ll be sure to reference your name and website. You’re awesome man! 🙂

  7. Thank you for the wonderful work. It is much useful if it has pronunciations as well. Is there any way for you to include them

  8. Thanx brother great hard work done by you to make our work easy. i was looking for the same.
    Thanx Once again Great Job.

  9. This is not a CSV file. It’s just a test dump of dictionary data containing non standard test which kinda resembles a dictionary. CSV means “Comma Separated Values” .
    You should not use the acronym CSV. Did you even research what a CSV file was before pretending to make one. It should not be published with the CSV extension because it is not a CSV text data-file.

    It hinders real work because it useless as a CSV file. People should study the subject matter before committing a claimed general work. Yes it can be converted to CSV. but that dosent make it a CSV file. call it .dat or something other than CSV. Why ? CSV is a standard file format with specific defined formatting rules.

    1. Each line of a .CSV file is considered as a data-entry/record.
      Which is what the files provide.

      Also, as mentioned in the post above, here is the link to the comma separated words and meanings: https://www.bragitoff.com/wp-content/uploads/2016/03/dictionary.csv

      Moreover, I am not sure how it cost you more time? It literally gives you the word meanings in each line, and not only that it gives the list of words separately as well. Just feed it into http://www.delim.co and generate an array from the .csv file.

      As I explained in another comment: ”

      Well you can easily separate the words from the meanings by little programming.. I mean you could use a code that would find the first occurrence of this ‘)’ and then copy everything after it to a different file.
      I preferred to have the words and the meanings together cause I wanted to display the whole thing.
      Again a little programming can be used to concatenate(append) all the files together. However I didnt do that to save processing time. Cause I could just have a look at the first letter and know the file that I need to look for the word in thereby saving me some time. I f you have all the words in the same file and the user enters a word startin
      g from ‘z’ then you would have to unnecessarily go through almost all the words in the file, thereby wasting a lot of time.

  10. I wasted allot of time trying to figure out this junk. I’m sorry I’m not trying to be rude, but It’s very incomplete and full of errors. You should have kept it to yourself and the few people who needed this kind of incomplete data-set. It cost me time. There is already to much incomplete error riddled work published on the internet .

    1. Well, as you can see at least 40 plus comments are expressing their gratitude for it. So, it wasn’t really useless.

      1. I think commanderklag and other users complaining about corrupted data are referring to the single .csv file you provided from another user. I spent a few hours manipulating it before I realised that more than two-thirds of the data is missing. The dictionary should have around 175000 entries – not the 50000 or so contained in the compiled csv. For example, the entry for ‘and’ is missing. I noticed that lots of people seem to have used that file in their work and haven’t realised it is even missing an entry for ‘and’.

        Fortunately, the A-Z files you also provided seem complete and I used these to build the full dictionary data source. Many of the entries under a headword are referential to the original headword. I spent a few hours manually going through the file trying to find all of these and providing a separate full headword for each of these – many of them reference noun plurals. In this process, I identified maybe around 5 entries that didn’t make sense and just discarded them, but the rest should be there. I then combined entries with the same headword into a single entry, so each headword is now unique (moving from around 175000 to around 110000). If someone needs to, they can split this back into original form using the square brackets ([]) denoting part-of-speech. Finally, I took the part-of-speech elements (captured in square brackets in this file) and cleaned them up. For example, an entry might have said “[adverb]” instead of “[adv.]”, or “[ a/]” instead of “[a.]” and I spent a few further hours cleaning up all I could find.

        Hope this helps – I recommend using Google Sheets, MS Excel or some other powerful spreadsheet to get the data into the form you need. To the author Manas, feel free to take this file if useful and use it without further attribution and then remove this comment. Thanks for the original A-Z files!

        https://docs.google.com/spreadsheets/d/1vgNJpEWVppQv1CYPE8O_Z72mugHiqbMCvWbQPsEATcY/edit?usp=sharing

    2. Dear commander-KEGEL –> A.K.A. commanderklag <–

      If you wasted so much time, then you might be an idiot… Or a Script kitty! Meooooooow…. Learn to actually code and who cares about whether it's a .csv, .dat, or .whogivesashit file extension, the data is there and if you truly new what you were talking about and didn't run to the google bot to look up csv, you wouldn't be such a rude kegel-popping-commander…

      Yours truly,
      A-Tr0ll

  11. Thank you, Manas. This makes looking up meanings of words, very easy. Much appreciated.

  12. I too was a physicist getting a PhD in Radio Physics from The University of Adelaide 43 years ago. Now I am writing a kind of semi-autobiography, first in Urdu then in English. I cannot read Hindi script.
    Having lived overseas for almost 50 years, I had forgotten most of my Urdu/Hindi vocabulary but it is slowly returning. Having found it difficult with on-line dictionaries I have been considering creating my own English/Urdu/Hindi dictionary Thesaurus using a database approach then making it available on line free. I am going to download your English dictionary and experiment with that first.

  13. The combined csv is missing a lot of entries. For example, the word “example” is missing along with any E word after “eclipsed.” Not sure what happened, but it seems like about half of the entries are missing in that file. I’m going to have to redo the work of putting it all together, so email me if you would like the finished CSV to post here.

  14. Bhai… Thank You… I intended to make an excel game for my daughter to learn and pronounce and understand meaning of it in excel… so required it… its really helpful….
    Thanks Bhai…

Leave a Reply

Your email address will not be published. Required fields are marked *