Input/Output

We provide two simple functions to download_data from google buckets and read in anndatada from csv files. We stress that the recomanded format for reading/writing anndata object is the .h5ad format.

download_from_bucket(bucket_name, source_path, destination_path)[source]

Helper function to download files from google buckets.

Parameters
  • bucket_name (str) – the name of the google bucket. For example: “ld-data-bucket”

  • source_path (str) – path to the file in the bucket. For example “tissue-purifier/slideseq_testis_anndata_h5ad.tar.gz”

  • destination_path (str) – path in the local filesystem to save file. For example “my_dir/my_file_h5ad.tar.gz”

anndata_from_expression_csv(filename, key, transpose, top_n_rows=None)[source]

Read a csv file with the expression data (i.e. count matrix) and returns an anndata object. To be used when your collaborators give you a .csv file instead of a .h5ad file.

If transpose == False: The csv is expected to have a header: ‘barcode’, ‘gene_name_1’, …, ‘gene_name_N’. Each entry is expected to be something-like: ACCDAT, 2, 0, …., 1

If transpose == True: The csv is expected to have a header: ‘gene’, ‘barcode_name_1’, …, ‘barcode_name_N’. Each entry is expected to be something-like: Arhgap18, 2, 0, …., 1

Parameters
  • filename (str) – the path to the csv file to read

  • key (str) – the column name associated with the observations. It defaults to ‘barcode’ is transpose == False and ‘gene’ if transpose == True.

  • transpose (bool) – bool, whether the matrix is gene_by_cell or cell_by_gene

  • top_n_rows (Optional[int]) – int, the number of the top rows to read. Set to a small value (like 20) for debugging.

Note

The output will always be cell_by_gene (i.e. cells=obs, genes=var) regardless the value of transpose

Returns

adata – An anndata object with (i) anndata.X the counts in a scipy Compressed Sparse Row format (ii) anndata.obs the observation name (often the cellular barcodes) (iii) anndata.var the variable names (often the gene names)