Hearem FAQ - AI Voice Reader Help

Question 1

What is Hearem?

Accepted Answer

Hearem is an AI voice reader for iPhone and iPad. It turns content you would normally read on screen into natural audio you can keep listening to during commutes, walks, chores, study sessions, long-article reading, or moments when your eyes need a break.

You can paste text, read from the clipboard, scan images, import PDF/TXT/EPUB files, paste web links, or send content through the iOS share sheet. When generating audio, you can choose Apple system voices, premium AI voices, or authorized cloned voices; generated audio can keep playing like a personal playlist in the background or on the lock screen.

Question 2

How do I start listening with Hearem?

Accepted Answer

Paste or type text, choose a language and voice, then press play. You can also scan images, import documents, paste web links, or use clipboard detection.

Question 3

Can I import files from WeChat or other apps?

Accepted Answer

Yes. Open the file in WeChat or another app, then use “Open in…” or the iOS share sheet to send it to Hearem.

Common supported formats include images, PDF, TXT, MD, EPUB, DOC, and DOCX files. The exact processing flow depends on the file type.

Question 4

Is Hearem free?

Accepted Answer

Yes. You can use Hearem for free with Apple system voices and basic allowances for text, OCR, web reading, and other core features. Upgrade to Standard for higher character limits, premium AI voices, the full catalog, and more advanced processing.

Question 5

Does Standard still have character limits?

Accepted Answer

Yes. Standard currently includes 1,000,000 basic characters per month plus 20,000 premium voice characters, and those two balances are tracked separately.

Basic characters reset on your renewal cycle, while unused premium voice characters can carry over to the next renewal. If you still need more, you can also purchase extra premium voice character packs.

Question 6

What if my monthly quota is not enough?

Accepted Answer

If your monthly quota is not enough, you can purchase additional voice packs to expand it. Hearem currently offers both a small pack and a large pack.

The small pack includes 20,000 premium voice characters and 100,000 basic characters. The large pack includes 130,000 premium voice characters and 800,000 basic characters. Purchased premium voice characters do not expire and are added to your balance immediately.

In practice, that means paid users can keep topping up as needed instead of being blocked by the monthly allowance alone.

Question 7

Why do different voices consume different amounts of quota?

Accepted Answer

Different providers, voice models, and even specific modes have different underlying costs, so Hearem converts quota usage based on those real cost differences to keep the service sustainable.

For example, Apple built-in voices consume no characters. Some premium voices use a higher multiplier. At the moment, ElevenLabs uses 3x character cost, and Minimax enhanced mode uses 2x.

If efficiency matters most, choose lower-cost voices. If naturalness, expressiveness, or a specific effect matters more, higher-multiplier premium voices may still be the better fit.

Question 8

Do I need to log in?

Accepted Answer

Yes. Hearem currently requires sign-in before use, and Sign in with Apple is supported. Your account is used to confirm your subscription, balance, and purchase history, and to help with access recovery or support requests.

Question 9

What should I do if sign-in fails?

Accepted Answer

First, fully quit and reopen Hearem, then make sure your network connection is stable. If sign-in still fails, update to the latest version and, if needed, reinstall the app before trying again.

If the problem continues, contact support with your device model, iOS version, app version, and a screenshot of the error so the issue can be diagnosed faster.

Question 10

Can I recover my generated audio after reinstalling the app?

Accepted Answer

That depends on whether the content was already saved, exported, or synced before reinstalling. If you are worried about losing existing work, export the audio you want to keep before reinstalling.

Content that was properly backed up or synced can usually be restored after you sign in again. Local-only data that was never saved to history, drafts, or backup may not be recoverable.

Question 11

Which languages and voices are supported?

Accepted Answer

Hearem supports multilingual reading, including Chinese, English, Japanese, Korean, French, Spanish, German, Portuguese, Italian, Russian, Arabic, Dutch, and more. Exact availability depends on the voice provider and individual voice, which you can review and filter in the voice list.

Voice options are grouped by provider: Apple system voices are useful for free and basic reading; Microsoft Azure offers broad language coverage; Minimax, Doubao, TikTok, and Qwen are especially useful for Chinese or Chinese-English content; ElevenLabs, OpenAI, and Fish Audio provide more natural or expressive premium AI voices.

You can also create or use authorized cloned voices if you want a personal voice. Different voice models vary in language coverage, language quality, and which controls they support, including speed, pitch, emotion, volume, and audio tags, so it is best to choose a voice based on the language and listening scenario before generating audio.

Question 12

Why do some Japanese texts with kanji get read as Chinese?

Accepted Answer

In multilingual text, some voice models may default to Chinese pronunciation for kanji, especially when the language context is not clear enough.

Language support varies a lot by voice model. Some are stronger in Chinese, some in English or Japanese, and some handle mixed-language text more reliably. If this happens, try switching to a voice that is stronger in your target language.

Question 13

Does Hearem support Cantonese?

Accepted Answer

Yes. Hearem includes Cantonese-related voice options, and some Chinese voices also offer a setting that prioritizes Cantonese pronunciation.

Results still vary by provider, voice, and text content. If Cantonese is your main use case, choose voices explicitly labeled for Cantonese or enable Cantonese-priority options when available.

Question 14

How can I improve dialect pronunciation if it sounds inaccurate?

Accepted Answer

When using dialect voices, it is usually better to write the text in a more spoken, dialect-native way instead of feeding the model formal standard written Chinese.

Dialects often have their own vocabulary, phrasing, and sentence patterns. More authentic dialect text usually helps the model reproduce the pronunciation and feel of the dialect more accurately.

For example, Cantonese works better with phrasing closer to daily speech rather than literal standard Chinese wording. The same idea applies to Minnan, Shanghainese, and other dialects: the more natural the dialect text is, the more natural the output tends to sound.

Question 15

When can I use audio tags or scene/style prompts?

Accepted Answer

These controls are currently available only with supported ElevenLabs voices. Switch to an ElevenLabs voice first; if that voice supports them, you can use those tags during generation. Apple voices and most other providers do not offer the same tag-based controls yet.

Question 16

Can I listen to ebooks or long text?

Accepted Answer

Yes. After you import an ebook, long article, or long document, Hearem automatically splits the text according to the selected voice provider's per-request length limits and generates the audio in sequence, so you do not need to split the text or run TTS in batches manually.

Once processing is complete, you can listen to the full long-form piece continuously. You can also merge the generated segments and export them as a single audio file when needed.

Question 17

Why do some voices generate in the background while others wait directly?

Accepted Answer

Different voice providers handle long text with different speed and reliability. To keep generation stable, Hearem automatically chooses between two modes based on the text length and selected voice: shorter text usually waits for the result directly, while longer text is sent to background generation to avoid timeouts, interrupted output, or provider-side limits.

The current thresholds are: MiniMax, Azure, ElevenLabs, FishAudio, Gemini, and Doubao switch to background generation when the text is over 500 characters; Qwen and TikTok switch when the text is over 300 characters. "Over" means greater than the threshold, so up to 500 characters will usually generate directly, while 501 characters or more will use background generation for those 500-character providers.

These thresholds may be adjusted over time as provider speed and reliability change. The goal is to keep short requests fast while making long-form generation more likely to complete successfully.

Question 18

Why does long-text generation sometimes fail or get stuck while downloading audio?

Accepted Answer

Sometimes long-text generation does not fail during speech synthesis itself, but later while downloading, merging, or saving the audio. If this happens, check the failed record in history first, then retry or export again if needed. If the same long text fails repeatedly, send support the source text, selected voice, and a screenshot of the failure so the exact stage can be diagnosed.

For users in mainland China, some overseas voice providers may also fail because of network conditions, so a more stable connection or a VPN may be needed.

Question 19

Do PDFs or documents use up character balance?

Accepted Answer

OCR and speech generation are separate steps. OCR itself does not consume character balance. Only when the extracted text is actually sent into speech generation does it count against the character balance for the voice plan you choose.

The exact balance used depends on whether you generate with a standard voice or a premium voice, so it is worth checking your selected voice and remaining balance before starting a long document.

Question 20

Why do imports from Photos or image files sometimes fail?

Accepted Answer

If the image is still stored in iCloud and has not fully downloaded to the device, importing it from Photos or an attachment may fail.

In that case, first make sure the image is fully downloaded locally, then try importing it again. If the issue only affects specific formats or specific files, send a sample file to support for review.

Question 21

Does playback continue in the background?

Accepted Answer

Yes. Generated audio can keep playing while the phone is locked, while you leave the app, or while you do other things.

Question 22

Can I listen offline?

Accepted Answer

Speech generation usually requires an internet connection. Once audio has been generated or saved, you can listen to that audio offline.

Question 23

Can I export generated audio?

Accepted Answer

Yes. Generated audio can be saved as a file and shared to other apps.

Question 24

Does playback speed change the exported audio file?

Accepted Answer

No. Playback speed controls only affect how the audio is played back in the app and do not rewrite the original generated audio file.

If you need the exported audio to keep a specific speed outside the app, you can continue using it in players that support speed adjustment or post-processing, such as VLC or nPlayer.

Question 25

Can I export subtitle files?

Accepted Answer

Yes. If an audio record has subtitles attached, you can export the subtitle file from the history screen or the player menu.

For long-form content generated in segments, you can export subtitles per segment or merge them into one full subtitle export.

Question 26

Can I keep multiple items and organize my listening history?

Accepted Answer

Yes. You can save drafts and come back later to edit or generate them, and completed audio is kept in your listening history.

As your library grows, you can also use albums to organize different articles, books, or listening themes.

Question 27

Does Hearem support voice cloning?

Accepted Answer

Yes. You can upload authorized voice samples to create a personal voice. Only clone your own voice or a voice you have explicit permission to use.

Question 28

What balance does a cloned voice use?

Accepted Answer

Cloned voices are treated as premium voice models, so generating speech with them uses premium voice characters rather than the basic character balance.

Your first Standard activation includes one free voice clone. If you plan to use cloned voices often, it is worth keeping an eye on your remaining premium voice character balance.

Question 29

How can I make a cloned voice sound more like my emotion and speaking style?

Accepted Answer

The emotional quality and speaking style of a cloned voice depend a lot on the sample you record. If the sample itself sounds flat, the cloned result will often sound flat as well.

You do not have to read only the reference text while recording. You can read any content, and the sample can be up to 5 minutes long. If you want the clone to sound angry, excited, warm, calm, or otherwise expressive, those tones should already be present in the sample.

In general, a longer sample with richer content, more emotional variation, and more natural delivery gives the model more to learn from and usually improves the result.

Question 30

Why does long-form audio sometimes start mid-way, jump, or repeat a segment?

Accepted Answer

This usually relates to source text structure, segment boundaries, or an abnormal step in the generation pipeline. If long-form audio starts in the middle, repeats a section, or loses continuity, keep the original text and send the affected record to support for review.

If the generated audio significantly mismatches the original text, contains garbled speech, or includes strange noises, you can also use the bottom menu option in the player to report the issue and request a quota refund. Approved requests refund the quota consumed by that generation.

Question 31

Can I control speed, pitch, or emotion?

Accepted Answer

Yes. Hearem supports playback speed controls, and some premium AI voices support finer controls such as emotion, pitch, and volume.

Question 32

What should I do if the app crashes?

Accepted Answer

If the app crashes right after launch or repeatedly closes when you enter a specific page, fully quit and reopen it first, then make sure you are on the latest version. If it still crashes, try restarting the device and testing again.

If you plan to reinstall, export or confirm sync for anything you do not want to lose first. If the app keeps crashing, send support your device model, iOS version, app version, what you were doing before the crash, and any screen recording or screenshot you can provide.

Question 33

My subscription or balance did not refresh. What should I do?

Accepted Answer

Restart the app first, then refresh purchase status from the user center or subscription screen. If it still does not recover, contact support with purchase details, device model, and app version.

Question 34

Why did web reading fail?

Accepted Answer

Some sites require login, use paywalls, or block extraction. In those cases, open the page in your browser, copy the article text, and paste it into Hearem.

Question 35

Is my data private?

Accepted Answer

Hearem is designed to be local-first where possible. OCR is performed on device by default. Premium AI voices and cloning send only the necessary content to the selected provider for processing.

Question 36

Is Android supported?

Accepted Answer

Hearem currently focuses on iOS. Android support may be considered later depending on demand and development capacity.

Frequently Asked Questions