Hearem TTS

FAQ

Frequently Asked Questions

Answers to common questions before and after downloading Hearem.

What is Hearem?

Hearem is an AI voice reader for iPhone and iPad. It turns content you would normally read on screen into natural audio you can keep listening to during commutes, walks, chores, study sessions, long-article reading, or moments when your eyes need a break.

You can paste text, read from the clipboard, scan images, import PDF/TXT/EPUB files, paste web links, or send content through the iOS share sheet. When generating audio, you can choose Apple system voices, premium AI voices, or authorized cloned voices; generated audio can keep playing like a personal playlist in the background or on the lock screen.

How do I start listening with Hearem?

Paste or type text, choose a language and voice, then press play. You can also scan images, import documents, paste web links, or use clipboard detection.

Can I import files from WeChat or other apps?

Yes. Open the file in WeChat or another app, then use “Open in…” or the iOS share sheet to send it to Hearem.

Common supported formats include images, PDF, TXT, MD, EPUB, DOC, and DOCX files. The exact processing flow depends on the file type.

Is Hearem free?

Yes. You can use Hearem for free with Apple system voices and basic allowances for text, OCR, web reading, and other core features. Upgrade to Standard for higher character limits, premium AI voices, the full catalog, and more advanced processing.

Does Standard still have character limits?

Yes. Standard currently includes 1,000,000 basic characters per month plus 20,000 premium voice characters, and those two balances are tracked separately.

Basic characters reset on your renewal cycle, while unused premium voice characters can carry over to the next renewal. If you still need more, you can also purchase extra premium voice character packs.

What if my monthly quota is not enough?

If your monthly quota is not enough, you can purchase additional voice packs to expand it. Hearem currently offers both a small pack and a large pack.

The small pack includes 20,000 premium voice characters and 100,000 basic characters. The large pack includes 130,000 premium voice characters and 800,000 basic characters. Purchased premium voice characters do not expire and are added to your balance immediately.

In practice, that means paid users can keep topping up as needed instead of being blocked by the monthly allowance alone.

Why do different voices consume different amounts of quota?

Different providers, voice models, and even specific modes have different underlying costs, so Hearem converts quota usage based on those real cost differences to keep the service sustainable.

For example, Apple built-in voices consume no characters. Some premium voices use a higher multiplier. At the moment, ElevenLabs uses 3x character cost, and Minimax enhanced mode uses 2x.

If efficiency matters most, choose lower-cost voices. If naturalness, expressiveness, or a specific effect matters more, higher-multiplier premium voices may still be the better fit.

Do I need to log in?

Yes. Hearem currently requires sign-in before use, and Sign in with Apple is supported. Your account is used to confirm your subscription, balance, and purchase history, and to help with access recovery or support requests.

What should I do if sign-in fails?

First, fully quit and reopen Hearem, then make sure your network connection is stable. If sign-in still fails, update to the latest version and, if needed, reinstall the app before trying again.

If the problem continues, contact support with your device model, iOS version, app version, and a screenshot of the error so the issue can be diagnosed faster.

Can I recover my generated audio after reinstalling the app?

That depends on whether the content was already saved, exported, or synced before reinstalling. If you are worried about losing existing work, export the audio you want to keep before reinstalling.

Content that was properly backed up or synced can usually be restored after you sign in again. Local-only data that was never saved to history, drafts, or backup may not be recoverable.

Which languages and voices are supported?

Hearem supports multilingual reading, including Chinese, English, Japanese, Korean, French, Spanish, German, Portuguese, Italian, Russian, Arabic, Dutch, and more. Exact availability depends on the voice provider and individual voice, which you can review and filter in the voice list.

Voice options are grouped by provider: Apple system voices are useful for free and basic reading; Microsoft Azure offers broad language coverage; Minimax, Doubao, TikTok, and Qwen are especially useful for Chinese or Chinese-English content; ElevenLabs, OpenAI, and Fish Audio provide more natural or expressive premium AI voices.

You can also create or use authorized cloned voices if you want a personal voice. Different voice models vary in language coverage, language quality, and which controls they support, including speed, pitch, emotion, volume, and audio tags, so it is best to choose a voice based on the language and listening scenario before generating audio.

Why do some Japanese texts with kanji get read as Chinese?

In multilingual text, some voice models may default to Chinese pronunciation for kanji, especially when the language context is not clear enough.

Language support varies a lot by voice model. Some are stronger in Chinese, some in English or Japanese, and some handle mixed-language text more reliably. If this happens, try switching to a voice that is stronger in your target language.

Does Hearem support Cantonese?

Yes. Hearem includes Cantonese-related voice options, and some Chinese voices also offer a setting that prioritizes Cantonese pronunciation.

Results still vary by provider, voice, and text content. If Cantonese is your main use case, choose voices explicitly labeled for Cantonese or enable Cantonese-priority options when available.

How can I improve dialect pronunciation if it sounds inaccurate?

When using dialect voices, it is usually better to write the text in a more spoken, dialect-native way instead of feeding the model formal standard written Chinese.

Dialects often have their own vocabulary, phrasing, and sentence patterns. More authentic dialect text usually helps the model reproduce the pronunciation and feel of the dialect more accurately.

For example, Cantonese works better with phrasing closer to daily speech rather than literal standard Chinese wording. The same idea applies to Minnan, Shanghainese, and other dialects: the more natural the dialect text is, the more natural the output tends to sound.

When can I use audio tags or scene/style prompts?

These controls are currently available only with supported ElevenLabs voices. Switch to an ElevenLabs voice first; if that voice supports them, you can use those tags during generation. Apple voices and most other providers do not offer the same tag-based controls yet.

Can I listen to ebooks or long text?

Yes. After you import an ebook, long article, or long document, Hearem automatically splits the text according to the selected voice provider's per-request length limits and generates the audio in sequence, so you do not need to split the text or run TTS in batches manually.

Once processing is complete, you can listen to the full long-form piece continuously. You can also merge the generated segments and export them as a single audio file when needed.

Why does long-text generation sometimes fail or get stuck while downloading audio?

Sometimes long-text generation does not fail during speech synthesis itself, but later while downloading, merging, or saving the audio. If this happens, check the failed record in history first, then retry or export again if needed. If the same long text fails repeatedly, send support the source text, selected voice, and a screenshot of the failure so the exact stage can be diagnosed.

For users in mainland China, some overseas voice providers may also fail because of network conditions, so a more stable connection or a VPN may be needed.

Do PDFs or documents use up character balance?

OCR and speech generation are separate steps. OCR itself does not consume character balance. Only when the extracted text is actually sent into speech generation does it count against the character balance for the voice plan you choose.

The exact balance used depends on whether you generate with a standard voice or a premium voice, so it is worth checking your selected voice and remaining balance before starting a long document.

Why do imports from Photos or image files sometimes fail?

If the image is still stored in iCloud and has not fully downloaded to the device, importing it from Photos or an attachment may fail.

In that case, first make sure the image is fully downloaded locally, then try importing it again. If the issue only affects specific formats or specific files, send a sample file to support for review.

Does playback continue in the background?

Yes. Generated audio can keep playing while the phone is locked, while you leave the app, or while you do other things.

Can I listen offline?

Speech generation usually requires an internet connection. Once audio has been generated or saved, you can listen to that audio offline.

Can I export generated audio?

Yes. Generated audio can be saved as a file and shared to other apps.

Does playback speed change the exported audio file?

No. Playback speed controls only affect how the audio is played back in the app and do not rewrite the original generated audio file.

If you need the exported audio to keep a specific speed outside the app, you can continue using it in players that support speed adjustment or post-processing, such as VLC or nPlayer.

Can I export subtitle files?

Yes. If an audio record has subtitles attached, you can export the subtitle file from the history screen or the player menu.

For long-form content generated in segments, you can export subtitles per segment or merge them into one full subtitle export.

Can I keep multiple items and organize my listening history?

Yes. You can save drafts and come back later to edit or generate them, and completed audio is kept in your listening history.

As your library grows, you can also use albums to organize different articles, books, or listening themes.

Does Hearem support voice cloning?

Yes. You can upload authorized voice samples to create a personal voice. Only clone your own voice or a voice you have explicit permission to use.

What balance does a cloned voice use?

Cloned voices are treated as premium voice models, so generating speech with them uses premium voice characters rather than the basic character balance.

Your first Standard activation includes one free voice clone. If you plan to use cloned voices often, it is worth keeping an eye on your remaining premium voice character balance.

How can I make a cloned voice sound more like my emotion and speaking style?

The emotional quality and speaking style of a cloned voice depend a lot on the sample you record. If the sample itself sounds flat, the cloned result will often sound flat as well.

You do not have to read only the reference text while recording. You can read any content, and the sample can be up to 5 minutes long. If you want the clone to sound angry, excited, warm, calm, or otherwise expressive, those tones should already be present in the sample.

In general, a longer sample with richer content, more emotional variation, and more natural delivery gives the model more to learn from and usually improves the result.

Why does long-form audio sometimes start mid-way, jump, or repeat a segment?

This usually relates to source text structure, segment boundaries, or an abnormal step in the generation pipeline. If long-form audio starts in the middle, repeats a section, or loses continuity, keep the original text and send the affected record to support for review.

If the generated audio significantly mismatches the original text, contains garbled speech, or includes strange noises, you can also use the bottom menu option in the player to report the issue and request a quota refund. Approved requests refund the quota consumed by that generation.

Can I control speed, pitch, or emotion?

Yes. Hearem supports playback speed controls, and some premium AI voices support finer controls such as emotion, pitch, and volume.

What should I do if the app crashes?

If the app crashes right after launch or repeatedly closes when you enter a specific page, fully quit and reopen it first, then make sure you are on the latest version. If it still crashes, try restarting the device and testing again.

If you plan to reinstall, export or confirm sync for anything you do not want to lose first. If the app keeps crashing, send support your device model, iOS version, app version, what you were doing before the crash, and any screen recording or screenshot you can provide.

My subscription or balance did not refresh. What should I do?

Restart the app first, then refresh purchase status from the user center or subscription screen. If it still does not recover, contact support with purchase details, device model, and app version.

Why did web reading fail?

Some sites require login, use paywalls, or block extraction. In those cases, open the page in your browser, copy the article text, and paste it into Hearem.

Is my data private?

Hearem is designed to be local-first where possible. OCR is performed on device by default. Premium AI voices and cloning send only the necessary content to the selected provider for processing.

Is Android supported?

Hearem currently focuses on iOS. Android support may be considered later depending on demand and development capacity.