How to Read a Data File Into a Numpy Array Python

11. Reading and Writing Data Files: ndarrays

By Bernd Klein. Concluding modified: 01 Feb 2022.

There are lots of means for reading from file and writing to data files in numpy. We will hash out the dissimilar ways and corresponding functions in this chapter:

  • savetxt
  • loadtxt
  • tofile
  • fromfile
  • save
  • load
  • genfromtxt

Saving textfiles with savetxt

Scrabble with the Text Numpy, read, write, array

The start two functions we will embrace are savetxt and loadtxt.

In the post-obit simple example, we define an array x and save it equally a textfile with savetxt:

            import            numpy            as            np            x            =            np            .            array            ([[            1            ,            2            ,            three            ],            [            4            ,            five            ,            vi            ],            [            7            ,            8            ,            9            ]],            np            .            int32            )            np            .            savetxt            (            "exam.txt"            ,            10            )          

The file "test.txt" is a textfile and its content looks like this:

          [email protected]:~/Dropbox/notebooks/numpy$ more exam.txt 1.000000000000000000e+00 ii.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 v.000000000000000000e+00 half-dozen.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00        

Attention: The above output has been created on the Linux command prompt!

Information technology's also possible to print the array in a special format, like for example with three decimal places or as integers, which are preceded with leading blanks, if the number of digits is less than iv digits. For this purpose we assign a format string to the third parameter 'fmt'. Nosotros saw in our first case that the default delimeter is a blank. We tin can change this behaviour by assigning a string to the parameter "delimiter". In near cases this string volition consist solely of a single graphic symbol but it can be a sequence of character, like a smiley " :-) " every bit well:

            np            .            savetxt            (            "test2.txt"            ,            x            ,            fmt            =            "            %two.3f            "            ,            delimiter            =            ","            )            np            .            savetxt            (            "test3.txt"            ,            10            ,            fmt            =            "            %04d            "            ,            delimiter            =            " :-) "            )          

The newly created files look similar this:

          [email protected]:~/Dropbox/notebooks/numpy$ more test2.txt  ane.000,2.000,3.000 iv.000,5.000,six.000 7.000,8.000,ix.000          [email protected]:~/Dropbox/notebooks/numpy$ more test3.txt  0001 :-) 0002 :-) 0003 0004 :-) 0005 :-) 0006 0007 :-) 0008 :-) 0009        

The complete syntax of savetxt looks like this:

savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\northward', header='', footer='', comments='# ')        
Parameter Meaning
X array_like Information to be saved to a text file.
fmt str or sequence of strs, optional
A single format (%10.5f), a sequence of formats, or a multi-format cord, eastward.m. 'Iteration %d -- %x.5f', in which case 'delimiter' is ignored. For complex '10', the legal options for 'fmt' are:
a) a single specifier, "fmt='%.4e'", resulting in numbers formatted like "' (%s+%sj)' % (fmt, fmt)"
b) a full string specifying every real and imaginary function, e.k. "' %.4e %+.4j %.4e %+.4j %.4e %+.4j'" for 3 columns
c) a list of specifiers, one per column - in this case, the real and imaginary part must accept split up specifiers, e.k. "['%.3e + %.3ej', '(%.15e%+.15ej)']" for 2 columns
delimiter A string used for separating the columns.
newline A cord (e.k. "\due north", "\r\n" or ",\n") which will finish a line instead of the default line ending
header A String that will be written at the kickoff of the file.
footer A String that volition be written at the end of the file.
comments A Cord that will exist prepended to the 'header' and 'footer' strings, to mark them as comments. The hash tag '#' is used as the default.

Loading Textfiles with loadtxt

We will read in now the file "examination.txt", which we have written in our previous subchapter:

              y              =              np              .              loadtxt              (              "exam.txt"              )              impress              (              y              )            

OUTPUT:

[[ 1.  two.  3.]  [ 4.  5.  6.]  [ seven.  8.  9.]]            
              y              =              np              .              loadtxt              (              "test2.txt"              ,              delimiter              =              ","              )              print              (              y              )            

OUTPUT:

[[ 1.  two.  3.]  [ iv.  5.  6.]  [ 7.  8.  9.]]            

Cypher new, if we read in our text, in which we used a smiley to separator:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              )              impress              (              y              )            

OUTPUT:

[[ one.  ii.  iii.]  [ 4.  5.  half-dozen.]  [ vii.  8.  9.]]            

It'due south also possible to choose the columns by index:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              ,              usecols              =              (              0              ,              2              ))              print              (              y              )            

OUTPUT:

[[ 1.  3.]  [ 4.  6.]  [ 7.  nine.]]            

We will read in our next example the file "times_and_temperatures.txt", which we take created in our chapter on Generators of our Python tutorial. Every line contains a fourth dimension in the format "hh::mm::ss" and random temperatures between 10.0 and 25.0 degrees. We have to convert the time string into float numbers. The time volition be in minutes with seconds in the hundred. Nosotros define beginning a role which converts "hh::mm::ss" into minutes:

              def              time2float_minutes              (              time              ):              if              type              (              time              )              ==              bytes              :              time              =              time              .              decode              ()              t              =              fourth dimension              .              dissever              (              ":"              )              minutes              =              float              (              t              [              0              ])              *              60              +              float              (              t              [              1              ])              +              float              (              t              [              2              ])              *              0.05              /              three              render              minutes              for              t              in              [              "06:00:10"              ,              "06:27:45"              ,              "12:59:59"              ]:              print              (              time2float_minutes              (              t              ))            

OUTPUT:

360.1666666666667 387.75 779.9833333333333            

You might have noticed that we cheque the blazon of time for binary. The reason for this is the use of our function "time2float_minutes in loadtxt in the following example. The keyword parameter converters contains a dictionary which can concur a function for a column (the key of the cavalcade corresponds to the key of the dictionary) to catechumen the cord data of this column into a float. The string data is a byte string. That is why we had to transfer it into a a unicode string in our function:

              y              =              np              .              loadtxt              (              "times_and_temperatures.txt"              ,              converters              =              {              0              :              time2float_minutes              })              impress              (              y              )            

OUTPUT:

[[  360.     20.1]  [  361.5    sixteen.1]  [  363.     xvi.nine]  ...,   [ 1375.five    22.5]  [ 1377.     11.1]  [ 1378.5    15.2]]            
            # delimiter = ";" , # i.e. apply ";" as delimiter instead of whitespace                      

tofile

tofile is a part to write the content of an array to a file both in binary, which is the default, and text format.

A.tofile(fid, sep="", format="%s")

The data of the A ndarry is always written in 'C' lodge, regardless of the gild of A.

The data file written past this method can be reloaded with the function fromfile().

Parameter Meaning
fid can be either an open file object, or a string containing a filename.
sep The string 'sep' defines the separator between assortment items for text output. If information technology is empty (''), a binary file is written, equivalent to file.write(a.tostring()).
format Format string for text file output. Each entry in the assortment is formatted to text by first converting it to the closest Python blazon, and and so using 'format' % item.

Remark:

Information on endianness and precision is lost. Therefore it may not be a good idea to use the function to annal data or transport data between machines with dissimilar endianness. Some of these problems can be overcome past outputting the information as text files, at the expense of speed and file size.

              dt              =              np              .              dtype              ([(              'fourth dimension'              ,              [(              'min'              ,              int              ),              (              'sec'              ,              int              )]),              (              'temp'              ,              float              )])              x              =              np              .              zeros              ((              1              ,),              dtype              =              dt              )              x              [              'time'              ][              'min'              ]              =              10              ten              [              'temp'              ]              =              98.25              print              (              10              )              fh              =              open              (              "test6.txt"              ,              "bw"              )              x              .              tofile              (              fh              )            

OUTPUT:

Live Python training

instructor-led training course

Upcoming online Courses

Enrol here

fromfile

fromfile to read in data, which has been written with the tofile function. It's possible to read binary data, if the information type is known. It's too possible to parse only formatted text files. The data from the file is turned into an array.

The general syntax looks similar this:

numpy.fromfile(file, dtype=bladder, count=-1, sep='')

Parameter Meaning
file 'file' can be either a file object or the name of the file to read.
dtype defines the data type of the array, which will exist synthetic from the file information. For binary files, it is used to decide the size and byte-society of the items in the file.
count defines the number of items, which will be read. -1 means all items will be read.
sep The string 'sep' defines the separator between the items, if the file is a text file. If it is empty (''), the file volition be treated as a binary file. A space (" ") in a separator matches cipher or more than whitespace characters. A separator consisting solely of spaces has to friction match at to the lowest degree one whitespace.
              fh              =              open              (              "test4.txt"              ,              "rb"              )              np              .              fromfile              (              fh              ,              dtype              =              dt              )            

OUTPUT:

array([((4294967296, 12884901890), 1.0609978957e-313),        ((30064771078, 38654705672), 2.33419537056e-313),        ((55834574860, 64424509454), 3.60739284543e-313),        ((81604378642, 90194313236), 4.8805903203e-313),        ((107374182424, 115964117018), half dozen.1537877952e-313),        ((133143986206, 141733920800), vii.42698527006e-313),        ((158913789988, 167503724582), 8.70018274493e-313),        ((184683593770, 193273528364), 9.9733802198e-313)],        dtype=[('time', [('min', '<i8'), ('sec', '<i8')]), ('temp', '<f8')])
              import              numpy              as              np              import              os              # platform dependent: difference between Linux and Windows              #data = np.arange(50, dtype=np.int)              data              =              np              .              arange              (              50              ,              dtype              =              np              .              int32              )              information              .              tofile              (              "test4.txt"              )              fh              =              open              (              "test4.txt"              ,              "rb"              )              # iv * 32 = 128              fh              .              seek              (              128              ,              os              .              SEEK_SET              )              x              =              np              .              fromfile              (              fh              ,              dtype              =              np              .              int32              )              impress              (              x              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]            

Attention:

Information technology tin crusade problems to employ tofile and fromfile for data storage, because the binary files generated are not platform independent. There is no byte-club or data-type information saved by tofile. Data can be stored in the platform independent .npy format using save and load instead.

Best Practice to Load and Save Data

The recommended way to store and load data with Numpy in Python consists in using load and save. We also use a temporary file in the following :

              import              numpy              as              np              print              (              x              )              from              tempfile              import              TemporaryFile              outfile              =              TemporaryFile              ()              x              =              np              .              arange              (              10              )              np              .              relieve              (              outfile              ,              x              )              outfile              .              seek              (              0              )              # Only needed here to simulate closing & reopening file              np              .              load              (              outfile              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49] assortment([0, one, 2, 3, 4, v, 6, 7, 8, 9])

and nonetheless another way: genfromtxt

There is still another way to read tabular input from file to create arrays. Equally the name implies, the input file is supposed to be a text file. The text file can be in the grade of an archive file equally well. genfromtxt tin process the archive formats gzip and bzip2. The type of the archive is determined by the extension of the file, i.e. '.gz' for gzip and bz2' for an bzip2.

genfromtxt is slower than loadtxt, simply it is capable of coping with missing data. It processes the file data in two passes. At beginning it converts the lines of the file into strings. Thereupon it converts the strings into the requested information blazon. loadtxt on the other hand works in ane go, which is the reason, why it is faster.

recfromcsv(fname, **kwargs)

This is not really another manner to read in csv data. 'recfromcsv' basically a shortcut for

np.genfromtxt(filename, delimiter=",", dtype=None)

Live Python training

instructor-led training course

Upcoming online Courses

Enrol here

aitkenglin1964.blogspot.com

Source: https://python-course.eu/numerical-programming/reading-and-writing-data-files-ndarrays.php

0 Response to "How to Read a Data File Into a Numpy Array Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel