Input/Output¶
We provide two simple functions to download_data from google buckets and read in anndatada from csv files. We stress that the recomanded format for reading/writing anndata object is the .h5ad format.
- download_from_bucket(bucket_name, source_path, destination_path)[source]¶
Helper function to download files from google buckets.
- Parameters
bucket_name (
str
) – the name of the google bucket. For example: “ld-data-bucket”source_path (
str
) – path to the file in the bucket. For example “tissue-purifier/slideseq_testis_anndata_h5ad.tar.gz”destination_path (
str
) – path in the local filesystem to save file. For example “my_dir/my_file_h5ad.tar.gz”
- anndata_from_expression_csv(filename, key, transpose, top_n_rows=None)[source]¶
Read a csv file with the expression data (i.e. count matrix) and returns an anndata object. To be used when your collaborators give you a .csv file instead of a .h5ad file.
If
transpose == False
: The csv is expected to have a header: ‘barcode’, ‘gene_name_1’, …, ‘gene_name_N’. Each entry is expected to be something-like: ACCDAT, 2, 0, …., 1If
transpose == True
: The csv is expected to have a header: ‘gene’, ‘barcode_name_1’, …, ‘barcode_name_N’. Each entry is expected to be something-like: Arhgap18, 2, 0, …., 1- Parameters
filename (
str
) – the path to the csv file to readkey (
str
) – the column name associated with the observations. It defaults to ‘barcode’ istranspose
== False and ‘gene’ iftranspose
== True.transpose (
bool
) – bool, whether the matrix is gene_by_cell or cell_by_genetop_n_rows (
Optional
[int
]) – int, the number of the top rows to read. Set to a small value (like 20) for debugging.
Note
The output will always be cell_by_gene (i.e. cells=obs, genes=var) regardless the value of
transpose
- Returns
adata – An anndata object with (i) anndata.X the counts in a scipy Compressed Sparse Row format (ii) anndata.obs the observation name (often the cellular barcodes) (iii) anndata.var the variable names (often the gene names)